[PDF] Personalized Visualization Recommendation

Abstract

Visualization recommendation work has focused solely on scoring visualizations based on the underlying dataset and not the actual user and their past visualization feedback. These systems recommend the same visualizations for every user, despite that the underlying user interests, intent, and visualization preferences are likely to be fundamentally different, yet vitally important. In this work, we formally introduce the problem of personalized visualization recommendation and present a generic learning framework for solving it. In particular, we focus on recommending visualizations personalized for each individual user based on their past visualization interactions (e.g., viewed, clicked, manually created) along with the data from those visualizations. More importantly, the framework can learn from visualizations relevant to other users, even if the visualizations are generated from completely different datasets. Experiments demonstrate the effectiveness of the approach as it leads to higher quality visualization recommendations tailored to the specific user intent and preferences. To support research on this new problem, we release our user-centric visualization corpus consisting of 17.4k users exploring 94k datasets with 2.3 million attributes and 32k user-generated visualizations.

Full PDF

11Personalized Visualization Recommendation

XIN QIAN,

University of Maryland, USA

RYAN A. ROSSI,

Adobe Research, USA

FAN DU,

Adobe Research, USA

SUNGCHUL KIM,

Adobe Research, USA

EUNYEE KOH,

Adobe Research, USA

SANA MALIK,

Adobe Research, USA

TAK YEON LEE,

Adobe Research, USA

NESREEN K. AHMED,

Intel Labs, USAVisualization recommendation work has focused solely on scoring visualizations based on the underlyingdataset, and not the actual user and their past visualization feedback. These systems recommend the samevisualizations for every user, despite that the underlying user interests, intent, and visualization preferencesare likely to be fundamentally different, yet vitally important. In this work, we formally introduce the problemof personalized visualization recommendation and present a generic learning framework for solving it. Inparticular, we focus on recommending visualizations personalized for each individual user based on their pastvisualization interactions ( e.g. , viewed, clicked, manually created) along with the data from those visualizations.More importantly, the framework can learn from visualizations relevant to other users, even if the visualizationsare generated from completely different datasets. Experiments demonstrate the effectiveness of the approachas it leads to higher quality visualization recommendations tailored to the specific user intent and preferences.To support research on this new problem, we release our user-centric visualization corpus consisting of 17.4kusers exploring 94k datasets with 2.3 million attributes and 32k user-generated visualizations.Additional Key Words and Phrases: Personalized visualization recommendation, user-centric visualizationrecommendation, deep learning

ACM Reference Format:

Xin Qian, Ryan A. Rossi, Fan Du, Sungchul Kim, Eunyee Koh, Sana Malik, Tak Yeon Lee, and Nesreen K.Ahmed. 2021. Personalized Visualization Recommendation. 37 pages. https://doi.org/0000001.0000001

With massive datasets becoming ubiquitous, visualization recommendation systems have becomeincreasingly important. These systems have the promise of enabling rapid visual analysis andexploration of such datasets. However, existing end-to-end visualization recommendation systemsoutput a long list of visualizations based solely on simple visual rules [Wongsuphasawat et al. 2015,2017]. These systems lack the ability to recommend visualizations that are personalized to thespecific user and the tasks that are important to them. This makes it both time-consuming anddifficult for users to effectively explore such datasets and find meaningful visualizations.Recommending visualizations that are personalized to a specific user is an important unsolvedproblem. Prior work on visualization recommendation have focused mainly on rule-based or ML-based approaches that are completely agnostic to the user of the system. In particular, these systemsrecommend the same ranked list of visualizations for every user, despite that the underlying user

Authors’ addresses: Xin Qian, University of Maryland, College Park, MD, USA, [email protected]; Ryan A. Rossi, AdobeResearch, San Jose, CA, USA, [email protected]; Fan Du, Adobe Research, San Jose, CA, USA, [email protected]; Sungchul Kim,Adobe Research, San Jose, CA, USA, [email protected]; Eunyee Koh, Adobe Research, San Jose, CA, USA, [email protected];Sana Malik, Adobe Research, San Jose, CA, USA, [email protected]; Tak Yeon Lee, Adobe Research, San Jose, CA,USA, [email protected]; Nesreen K. Ahmed, Intel Labs, Santa Clara, CA, USA, [email protected]. XXXX-XXXX/2021/2-ART1 $15.00https://doi.org/0000001.0000001 a r X i v : . [ c s . I R ] F e b :2 X. Qian et al. interests, intent, and visualization preferences are fundamentally different, yet vitally important forrecommending useful and interesting visualizations for a specific user. The rule-based methods usesimple visual rules to score visualizations whereas the existing ML-based methods have focusedsolely on classifying design choices [Hu et al. 2019] or ranking such design choices [Moritzet al. 2018] using a corpus of visualizations that are not tied to a user. Neither of these existingclasses of visualization recommendation systems focus on modeling individual user behavior norpersonalizing for individual users, which is at the heart of our work.In this work, we introduce a new problem of personalized visualization recommendation andpropose an expressive framework for solving it. The problem studied in this work is as follows: Givena set of 𝑛 users where each user has their own specific set of datasets, and each of the user datasetscontains a set of relevant visualizations ( i.e. , visualizations a specific user has interacted with inthe past, either implicitly by clicking/viewing or explicitly by liking or adding the visualizationto their favorites or a dashboard they are creating), the problem of personalized visualizationrecommendation is to learn an individual recommendation model for every user such that whena user selects a possibly new dataset of interest, we can apply the model for that specific userto recommend the top most relevant visualizations that are most likely to be of interest to them(despite that there is no previous implicit/explicit feedback on any of the visualizations from thenew dataset). Visualizations are fundamentally tied to a dataset as they consist of the (i) set ofvisual design choices ( e.g. , chart-type, color/size, x/y) and the (ii) subset of data attributes from thefull dataset used in the visualization. Therefore, how can we develop a learning framework forsolving the personalized visualization recommendation problem that is able to learn from otherusers and their relevant visualizations, even when those visualizations are from tens of thousandsof completely different datasets with no shared attributes?There are two important and fundamental issues at the heart of the personalized visualizationrecommendation problem. First, since visualizations are defined based on the attributes within asingle specific dataset, there is no way to leverage the visualization preferences of users acrossdifferent datasets. Second, since each user often has their own dataset of interest (not shared byother users), there is no way to leverage user preferences across different datasets. In this work, weaddress both problems. Notably, the framework proposed in this paper naturally generalizes to thefollowing problem settings: (a) single dataset with a single set of visualizations shared among allusers, and (b) tens of thousands of datasets that are not shared between users where each datasetof interest to a user gives rise to a completely different set of possible visualizations. However, theexisting work cannot be used to solve the new problem formulation that relaxes the single datasetassumption to make it more general and widely applicable.In the problem formulation of personalized visualization recommendation, each user can havetheir own set of datasets, and since each visualization represents a series of design choices anddata ( i.e. , attributes tied to a specific dataset), then this gives rise to a completely disjoint set ofvisualizations for each user. Hence, there is no way to directly leverage visualization feedbackfrom other users, since the visualizations are from different datasets. Furthermore, visualizationsare dataset specific, since they are generated based on the underlying dataset, and thereforeany feedback from a user cannot be directly leveraged for making better recommendations forother users and datasets. To understand the difficulty of the proposed problem of personalizedvisualization recommendation, the equivalent problem with regards to traditional recommendersystems would be as if each user on Amazon (or Netflix) had their own separate set of disjointproducts (or movies) that no other user could see and provide feedback. In such a setting, how canwe then use feedback from other users? Furthermore, given a single dataset uploaded by someuser, there are an exponential number of possible visualizations that can be generated from it. Thisimplies that even if there are some users interested in a single dataset, the amount of preferences ersonalized Visualization Recommendation 1:3 by those users is likely to be extremely small compared to the exponential number of possiblevisualizations that can be generated and preferred by such users.To overcome these issues and make it possible to solve the personalized visualization recom-mendation problem, we introduce two new models and representations that enable learning fromdataset and visualization preferences across different datasets and users, respectively. First, wepropose a novel model and representation that encodes users, their interactions with attributes (i.e.,attributes in any dataset) and we map every attribute to a shared k-dimensional meta-feature spacethat enables the model to learn from user-level data preferences across all the different datasets ofthe users. Most importantly, the shared meta-feature space is independent of the specific datasetsand the meta-features represent general functions of an arbitrary attribute, independent of the useror dataset that it arises. This enables the model to learn from user-level data preferences, despitethat those preferences are on entirely different datasets. Second, we propose a novel user-levelvisual preference graph model for visualization recommendation using the proposed notion of avisualization configuration that enables learning from user-level visual preferences across differentdatasets and users. Importantly, the graph model is able to directly learn from user-level visualpreferences across different datasets. This model encodes users and their visual-configurations(sets of design choices). Since each visual-configuration node represents a set of design choices thatare by definition not tied to a user-specific dataset, then the proposed model can use this user-levelvisual graph to infer and make connections between other similar visual-configurations that arealso likely to be useful to that user. This new graph model is critical since it allows the learningcomponent to learn from user-level visual preferences (which are visual-configurations) across thedifferent datasets and users. Without this novel component, there would be no way to learn fromother users visual preferences (sets of design choices). This work makes the following key contributions: • Problem Formulation:

We introduce and formulate the problem of personalized visual-ization recommendation that learns a personalized visualization recommendation model forevery individual user based on their past visualization feedback, and the feedback of otherusers and their relevant visualizations from completely different datasets. Our formulationremoves the unrealistic assumption of a single dataset shared across all users (and thus thatthere exists a single set of dataset-specific visualizations shared among all users). To solvethis problem, the model must be able to learn from the visualization and data preferences ofmany users across tens of thousands of different datasets. • Framework:

We propose a flexible framework that expresses a class of methods for the personalized visualization recommendation problem. To solve this new problem, we introducenew graph representations and models that enable learning from the visualization and datapreferences of users despite them being in different datasets entirely. More importantly, theproposed framework is able to exploit the visualization and data preferences of users acrosstens of thousands of different datasets. • Effectiveness:

The extensive experiments demonstrate the importance and effectiveness oflearning personalized visualization recommendation models for each individual user. Notably,our personalized models perform significantly better than SOTA baselines with a meanimprovement of 29.8% and 64.9% for HIT@5 and NDCG@5, respectively. Furthermore, thedeep personalized visualization recommendation models are shown to perform even better.Finally, comprehensive ablation studies are performed to understand the effectiveness of thedifferent learning components. :4 X. Qian et al.

First, we introduce a new problem of visualization recommendation in Section 2 that learns apersonalized model for each of the 𝑛 individual users by leveraging a large collection of datasetsand relevant visualizations from each of the datasets in the collection. Notably, the learning of theindividual user models are able to exploit the preferences of other users (even if the preferencesare on a completely different dataset) including the data attributes used in a visualization, visualdesign choices, and actual visualizations generated despite that no other user may have used theunderlying dataset of interest. In Section 3, we propose a computational framework for solvingthe new problem of personalized visualization recommendation. Further, we also propose deeppersonalized visualization recommendation models in Section 4 that are able to learn complex non-linear functions between the embeddings of the users, visualization-configurations, datasets andthe data attributes used in the visualizations. Next, Section 5 describes the user-centric visualizationcorpus we created and made publicly accessible for studying this problem. Then Section 6 providesa comprehensive and systematic evaluation of the proposed approach and framework for thepersonalized visualization recommendation problem while Section 7 discusses related work. Finally,Section 8 concludes with a summary of the key findings and briefly discusses directions for futurework on this new problem. In this section, we formally introduce the

Personalized Visualization Recommendation problem. Thepersonalized visualization recommendation problem has two main parts: (1) training a personalizedvisualization recommendation model for every user 𝑖 ∈ [ 𝑛 ] (Section 2.2), and (2) leveraging theuser-personalized model to recommend personalized visualizations based on the users past datasetand visualization feedback/preferences (Section 2.3).(1) Personalized Model Training (Sec. 2.2):

Given a user-level training visualization corpus 𝒟 = {(X 𝑖 , V 𝑖 )} 𝑛𝑖 = consisting of 𝑛 users and their corresponding datasets of interest 𝒳 𝑖 = { X 𝑖 , . . . , X 𝑖 𝑗 , . . . } as well as their relevant sets of visualizations V 𝑖 = { 𝒱 𝑖 , . . . , 𝒱 𝑖 𝑗 , . . . } forthose datasets, we first learn a user-level personalized model M from the training corpus 𝒟 that best captures and scores the effective visualizations for user 𝑖 highly while assigninglow scores to visualizations that are likely to not be preferred by the user.(2) Recommending Personalized Visualizations (Sec. 2.3):

Given a user 𝑖 ∈ [ 𝑛 ] and a dataset X 𝑖 𝑗 of interest to user 𝑖 , we use the trained personalized visualization recommendation model ℳ for user 𝑖 to generate, score, and recommend the top visualizations of interest to user 𝑖 for dataset X 𝑖 𝑗 . Note that we naturally support the case when the dataset X 𝑖 𝑗 ∉ 𝒳 𝑖 is newor when the dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 is not new, but we have at least one or more previous userfeedback about the visualizations the user likely prefers from that dataset.The fundamental difference between the ML-based visualization recommendation problem intro-duced in [Qian et al. 2020] and the personalized visualization recommendation problem describedabove is that the personalized problem focuses on modeling the behavior, data, and visualizationpreferences of individual users. Since local visualization recommendation models are learned forevery user 𝑖 ∈ [ 𝑛 ] (as opposed to training a global visualization recommendation model), it becomesimportant to leverage every single piece of feedback from the users. For instance, global visualiza-tion recommendation models essentially ignore the notion of a user, and therefore can leverageall available training data to learn the best global visualization recommendation model. However,personalized visualization recommendation models explicitly leverage specific user feedback tolearn the best personalized local model for every user 𝑖 ∈ [ 𝑛 ] , and there of course is far less feedbackfrom individual users. ersonalized Visualization Recommendation 1:5 In this work, relevant visualizations 𝒱 𝑖 𝑗 ∈ V 𝑖 for a specific user 𝑖 and dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 are definedgenerally, as the term relevant may refer to visualizations that a user clicked, liked, generated,among many other user actions that demonstrate positive feedback towards a visualization. Interms of personalized visualization recommendation, there are two general types of user feedback:implicit or explicit user feedback. Implicit user visualization feedback corresponds to user feedbackthat is not explicitly stated and includes user actions such as when a user clicks on a visualizationor hovers over a visualizations for more than a specific time. Conversely, explicit user feedback on avisualizations refers to feedback that is more explicitly stated about a visualization such as when auser explicitly likes a visualizations, or generates a visualization. Obviously, implicit user feedbackis available at a larger quantity than explicit user feedback. However, implicit user feedback is notas strong as user feedback that is explicit, e.g. , a user that clicked a visualization is not as strong asa user that explicitly liked a visualization.We propose two different types of user preferences (implicit and explicit user feedback) that areimportant for learning personalized visualization recommendation models for individual users,including the data preferences and visual preferences of each individual user. For learning the dataand visual preferences of a user, there is both implicit and explicit user feedback that can be usedfor developing personalized visualization recommender systems. There is naturally both implicit and explicit user feedback regarding the data preferences of users. Explicit user feedback about thedata preferences of a user is a far stronger signal than implicit user feedback, however, there istypically a lot more implicit user feedback for learning than explicit feedback from the user. • Implicit Data Preferences of Users.

An example of implicit feedback w.r.t. data preferencesof the user is when a user clicks (or hovers over) a visualization that uses two attributes x and y from some arbitrary user-selected dataset. We can then extract the users data preferencesfrom the visualization by encoding the two attributes that were used in the visualizationpreferred by that user. • Explicit Data Preferences of Users.

Similarly, an example of explicit feedback w.r.t. datapreferences of the user is when a user explicitly likes a visualization (or adds a visualization totheir dashboard) that uses two attributes x and y from some arbitrary user-selected dataset. Inthis work, we use another form of explicit feedback based on a user-generated visualizationand the attributes (data) used in the generated visualization. Hence, this is a form of explicitfeedback, since the user explicitly selects the attributes and creates a visualization using them(as opposed to clicking on a visualization automatically generated by a system).Besides using implicit and explicit feedback provided by the user based on the click or like of avisualization and the data used in it, we can also leverage an even more direct feedback abouta users data preferences. For instance, many visualization recommender systems allow users toselect an attribute of interest to use in the recommended visualizations. As such we can naturallyleverage any feedback of this type as well. In terms of visual prefer-ences of users, there is both implicit and explicit user feedback that can be used to learn a betterpersonalized visualization recommendation model for individual users. • Implicit Visual Preferences of Users.

An example of implicit feedback w.r.t. visual prefer-ences of the user is when a user clicks (or hovers over) a visualization from some arbitraryuser-selected dataset. We can then extract the users visual preferences from the visualization,and appropriately encode it for learning the visual preferences of the individual user. :6 X. Qian et al. • Explicit Visual Preferences of Users.

Similarly, an example of explicit feedback w.r.t. visualpreferences of the user is when a user explicitly likes a visualization (or adds a visualizationto their dashboard). Just as before, we can then extract the visual preferences of the userfrom the visualization (mark/chart type, x-type, y-type, color, size, x-aggregate, and so on)and leverage the individual visual preferences or a combination of them for learning theuser-specific personalized vis. rec. model.

Given user log data 𝒟 = {( 𝒳 , V ) , . . . , ( 𝒳 𝑖 , V 𝑖 ) , . . . , ( 𝒳 𝑛 , V 𝑛 )} = {( 𝒳 𝑖 , V 𝑖 )} 𝑛𝑖 = (1)where for each user 𝑖 ∈ [ 𝑛 ] , we have the set of datasets of interest to that user denoted as 𝒳 𝑖 alongwith the sets of relevant visualizations V 𝑖 generated by user 𝑖 for every dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 . Morespecifically, V 𝑖 = { 𝒱 𝑖 , . . . , 𝒱 𝑖 𝑗 , . . . } and 𝒱 𝑖 𝑗 = { 𝒱 𝑖 𝑗 , . . . , V 𝑖 𝑗𝑘 , . . . } (2) 𝒳 𝑖 = { X 𝑖 , . . . , X 𝑖 𝑗 , . . . } and X 𝑖 𝑗 = [ x 𝑖 𝑗 x 𝑖 𝑗 · · · ] (3)where x 𝑖 𝑗𝑘 is the 𝑘 th attribute (column vector) of X 𝑖 𝑗 . Hence, the number of attributes in X 𝑖 𝑗 hasno relation to the number of relevant visualizations | 𝒱 𝑖 𝑗 | that a user 𝑖 preferred for that dataset.For a single user 𝑖 , the number of user preferred visualizations across all datasets of interest forthat user is 𝑣 𝑖 = ∑︁ 𝒱 𝑖𝑗 ∈ V 𝑖 | 𝒱 𝑖 𝑗 | (4)where 𝒱 𝑖 𝑗 is the set of visualizations preferred by user 𝑖 from dataset 𝑗 . Thus, the total number ofuser generated visualizations across all users and datasets is 𝑣 = 𝑛 ∑︁ 𝑖 = ∑︁ 𝒱 𝑖𝑗 ∈ V 𝑖 | 𝒱 𝑖 𝑗 | (5)For simplicity, let V 𝑖 𝑗𝑘 ∈ 𝒱 𝑖 𝑗 = {V 𝑖 𝑗 , . . . , V 𝑖 𝑗𝑘 , . . . } denote the visualization generated by user 𝑖 from dataset 𝑗 , that is, X 𝑖 𝑗 ∈ 𝒳 𝑖 , specifically using the subset of attributes X ( 𝑘 ) 𝑖 𝑗 from the dataset X 𝑖 𝑗 .Further, every user 𝑖 ∈ [ 𝑛 ] is associated with a set of datasets 𝒳 𝑖 = { X 𝑖 , . . . , X 𝑖 𝑗 , . . . } of interest. Let X 𝑖 𝑗 be the 𝑗 th dataset of interest for user 𝑖 and let | X 𝑖 𝑗 | denote the number of attributes (columns)of the dataset matrix X 𝑖 𝑗 . Then the number of attributes across all datasets of interest to user 𝑖 is 𝑚 𝑖 = ∑︁ X 𝑖𝑗 ∈ 𝒳 𝑖 | X 𝑖 𝑗 | (6)and the number of attributes across all 𝑛 users and all their datasets is 𝑚 = 𝑛 ∑︁ 𝑖 = ∑︁ X 𝑖𝑗 ∈ 𝒳 𝑖 | X 𝑖 𝑗 | (7) Definition 1 (Space of Attribute Combinations).

Given an arbitrary dataset matrix X 𝑖 𝑗 ,let X 𝑖 𝑗 denote the space of attribute combinations of X 𝑖 𝑗 defined as Σ : X 𝑖 𝑗 → X 𝑖 𝑗 , s.t. (8) X 𝑖 𝑗 = { X ( ) 𝑖 𝑗 , . . . , X ( 𝑘 ) 𝑖 𝑗 , . . . } , (9) where Σ is an attribute combination generation function and every X ( 𝑘 ) 𝑖 𝑗 ∈ X 𝑖 𝑗 is a different subset(combination) of attributes from X 𝑖 𝑗 consisting of one or more attributes from X 𝑖 𝑗 . ersonalized Visualization Recommendation 1:7 Property 1.

Let | X 𝑖 𝑗 | and | X 𝑖𝑘 | denote the number of attributes (columns) of two arbitrarydatasets | X 𝑖 𝑗 | and | X 𝑖𝑘 | of user 𝑖 . If | X 𝑖 𝑗 | > | X 𝑖𝑘 | , then | X 𝑖 𝑗 | > | X 𝑖𝑘 | . It is straightforward to see that if | X 𝑖 𝑗 | > | X 𝑖𝑘 | , then the number of attribute combinations of X 𝑖 𝑗 denoted as | X 𝑖 𝑗 | is larger than the number of different attribute subsets that can be generated from X 𝑖𝑘 denoted as | X 𝑖𝑘 | . Property 1 is important as it characterizes the space of attribute combina-tions/subsets for a given dataset X 𝑖 𝑗 and therefore can be used to understand the correspondingspace of possible visualizations that can be generated from a given dataset, as these are also tied.In this work, we assume a visualization is specified using some grammar such as Vega-Lite [Satya-narayan et al. 2016]. Therefore, the data mapping and design choices of the visualization are encodedin json (or json-like format), and can easily render a visualization. A visualization configuration C (design choices) and the data attributes X ( 𝑘 ) 𝑖 𝑗 selected from a dataset X 𝑖 𝑗 is everything necessaryto generate a visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , C) . Hence, the tuple ( X ( 𝑘 ) 𝑖 𝑗 , C) defines a unique visualiza-tion V that leverages the subset of attributes X ( 𝑘 ) 𝑖 𝑗 from dataset X 𝑖 𝑗 along with the visualizationconfiguration C ∈ 𝒞 . Definition 2 (Visualization Configuration).

Given a visualization X generated using asubset of attributes X ( 𝑘 ) 𝑖 𝑗 from dataset X 𝑖 𝑗 , we define a function Γ : V → C (10) where Γ maps every data-dependent design choice of the visualization to its corresponding type ( i.e. ,the attribute mapping to the x-axis of the visualization V is replaced with its general type suchas quantitative, nominal, ordinal, temporal, etc). The resulting visualization configuration C is anabstraction of the visualization V , in the sense that all the data attribute bindings have been abstractedand replaced with their general data attribute type. Hence, C is an abstraction of V . Definition 3 (Space of Visualization Configurations).

Let 𝒞 denote the space of allvisualization configurations such that a visualization configuration C 𝑖𝑘 ∈ 𝒞 defines an abstraction ofa visualization where for each visual design choice (x, y, marker-type, color, size, etc.) that maps to anattribute in some dataset X 𝑖 𝑗 , we replace it with its type such as quantitative, nominal, ordinal, temporalor some other general property characterizing the attribute that can be selected. Therefore visualizationconfigurations are essentially visualizations without any attributes (data), or visualization abstractionsthat are by definition data-independent. Property 2.

Every visualization configuration C 𝑖𝑘 ∈ 𝒞 is independent of any data matrix X (byDefinition 3). The above implies that C 𝑖𝑘 ∈ 𝒞 can potentially arise from any arbitrary dataset and is thereforenot tied to any specific dataset since visualization configurations are general abstractions wherethe data bindings have been replaced with their general type, e.g. , if x/y in some visualizationmapped to an attribute in X , then it is replaced by its type ( i.e. , ordinal, quantitative, categorical,etc). A visualization configuration and the attributes selected from a dataset is everything necessaryto generate a visualization. The size of the space of visualization configurations is large sincevisualization configurations come from all possible combinations of design choices and their values. Definition 4 (Space of Visualizations of X 𝑖 𝑗 ). Given an arbitrary dataset matrix X 𝑖 𝑗 , wedefine V ★ 𝑖 𝑗 as the space of all possible visualizations that can be generated from X 𝑖 𝑗 . More formally,the space of visualizations V ★ 𝑖 𝑗 is defined with respect to a dataset X 𝑖 𝑗 and the space of visualization :8 X. Qian et al. configurations 𝒞 , X 𝑖 𝑗 = Σ ( X 𝑖 𝑗 ) = { X ( ) 𝑖 𝑗 , . . . , X ( 𝑘 ) 𝑖 𝑗 , . . . } (11) 𝜉 : X 𝑖 𝑗 × 𝒞 → 𝒱 ★ 𝑖 𝑗 (12) where X 𝑖 𝑗 = { X 𝑖 𝑗 , . . . , X ( 𝑘 ) 𝑖 𝑗 , . . . } is the set of all possible attribute combinations of X 𝑖 𝑗 (Def. 1). Moresuccinctly, 𝜉 : Σ ( X 𝑖 𝑗 ) × 𝒞 → 𝒱 ★ 𝑖 𝑗 , and therefore 𝜉 ( Σ ( X 𝑖 𝑗 ) , 𝒞 ) = 𝒱 ★ 𝑖 𝑗 . The space of all visualizations 𝒱 ★ 𝑖 𝑗 is determined entirely by the underlying dataset, and therefore remains the same for all 𝑛 users.The difference in our personalized visualization recommendation problem is the relevance of eachvisualization in the space of all possible visualizations generated from an arbitrary dataset. Givena subset of attributes X ( 𝑘 ) 𝑖 𝑗 ∈ X 𝑖 𝑗 from dataset X 𝑖 𝑗 and a visualization configuration C ∈ 𝒞 , then 𝜉 ( X ( 𝑘 ) 𝑖 𝑗 , C) ∈ 𝒱 ★ 𝑖 𝑗 is the corresponding visualization. Importantly, fix 𝒞 and let X ≠ Y = ⇒ ∀ 𝑖, 𝑗 x 𝑖 ≠ y 𝑗 , then 𝜉 ( Σ ( X ) , 𝒞 ) ∩ 𝜉 ( Σ ( Y ) , 𝒞 ) = ∅ . This impliesthe space of possible visualizations that can be generated is entirely dependent on the dataset (notthe user). Hence, for any two datasets X and Y without any shared attributes between them, theset of visualizations that can be generated from X or Y is completely different, 𝜉 ( Σ ( X ) , 𝒞 ) ∩ 𝜉 ( Σ ( Y ) , 𝒞 ) = ∅ This has important consequences for the new problem of personalized visualization recommenda-tion. Since it is unlikely that any two users care about the same underlying dataset, and even ifthey did, it is even far more unlikely that they have any relevant visualizations in common (just w.r.t. the exponential size of the visualization space for a single dataset with a reasonable amount ofattributes). Therefore, it is not possible nor practical to leverage the relevant visualizations of a userdirectly. Instead, we need to decompose a visualization V into its more meaningful componentssuch as: (i) the characteristics of the data attributes X ( 𝑘 ) 𝑖 𝑗 used in a visualization, and (ii) the visualdesign choices (chart-type/mark, color, size, and so on). Definition 5 (Relevant Visualizations of User 𝑖 and Dataset X 𝑖 𝑗 ). Let 𝒱 𝑖 𝑗 ∈ V 𝑖 definethe set of relevant (positive) visualizations for user 𝑖 with respect to dataset X 𝑖 𝑗 . Therefore, V 𝑖 = (cid:208) X 𝑖𝑗 ∈ 𝒳 𝑖 𝒱 𝑖 𝑗 where V 𝑖 is the set of all relevant visualizations across all datasets 𝒳 𝑖 of interest to user 𝑖 . Definition 6 (Non-relevant Visualizations of User 𝑖 and Dataset X 𝑖 𝑗 ). For a user 𝑖 ,let V ★ 𝑖 𝑗 denote the space of all visualizations that arise from the 𝑗 th dataset X 𝑖 𝑗 such that the relevant(positive) visualizations 𝒱 𝑖 𝑗 satisfies 𝒱 𝑖 𝑗 ⊆ 𝒱 ★ 𝑖 𝑗 , then the space of non-relevant visualizations foruser 𝑖 on dataset X 𝑖 𝑗 is 𝒱 − 𝑖 𝑗 = 𝒱 ★ 𝑖 𝑗 \ 𝒱 𝑖 𝑗 , which follows from 𝒱 − 𝑖 𝑗 ∪ 𝒱 𝑖 𝑗 = 𝒱 ★ 𝑖 𝑗 . We denote 𝑌 𝑖 𝑗𝑘 as the ground-truth label of a visualization V 𝑖 𝑗𝑘 ∈ 𝒱 ★ 𝑖 𝑗 where 𝑌 𝑖 𝑗𝑘 = V 𝑖 𝑗𝑘 ∈ 𝒱 𝑖 𝑗 and 𝑌 𝑖 𝑗𝑘 = ℳ 𝑖 for user 𝑖 from a large user-centric visualization trainingcorpus 𝒟 . Definition 7 (Training

Personalized

Vis. Recommendation Model).

Given the set oftraining datasets and relevant visualizations 𝒟 = {( 𝒳 𝑖 , V 𝑖 )} 𝑛𝑖 = , the goal is to learn a personalized visualization recommendation model ℳ 𝑖 for user 𝑖 by solving the following general objective function, arg min ℳ 𝑖 | 𝒳 𝑖 | ∑︁ 𝑗 = ∑︁ ( X ( 𝑘 ) 𝑖𝑗 , C 𝑖𝑗𝑘 ) ∈ 𝒱 𝑖𝑗 ∪ (cid:98) 𝒱 − 𝑖𝑗 L (cid:16) 𝑌 𝑖 𝑗𝑘 (cid:12)(cid:12) Ψ ( X ( 𝑘 ) 𝑖 𝑗 ) , 𝑓 (C 𝑖 𝑗𝑘 ) , ℳ 𝑖 (cid:17) , 𝑖 = , . . . , 𝑛 (13) ersonalized Visualization Recommendation 1:9 where L is the loss function, 𝑌 𝑖 𝑗𝑘 = { , } is the ground-truth label of the 𝑘 th visualization V 𝑖 𝑗𝑘 = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) ∈ 𝒱 𝑖 𝑗 ∪ (cid:98) 𝒱 − 𝑖 𝑗 for dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 of user 𝑖 . Further, X ( 𝑘 ) 𝑖 𝑗 ⊆ X 𝑖 𝑗 is the subset of attributes usedin the visualization. In Eq. 13, Ψ and 𝑓 are general functions over the subset of attributes X ( 𝑘 ) 𝑖 𝑗 ⊆ X 𝑖 𝑗 andthe visualization configuration C 𝑖 𝑗𝑘 of the visualization V 𝑖 𝑗𝑘 = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) ∈ 𝒱 − 𝑖 𝑗 ∪ 𝒱 𝑖 𝑗 , respectively. For learning individual models ℳ 𝑖 for every user 𝑖 ∈ [ 𝑛 ] , we can also leverage the visualizationand data preferences from other users. The simplest and most straightforward situation is whenthere is another user 𝑖 ′ ∈ [ 𝑛 ] with a set of relevant visualizations that use attributes from the sameexact dataset, hence | 𝒳 𝑖 ∩ 𝒳 𝑖 ′ | >

0. While the above strict assumption is convenient as it makesthe problem far simpler, it is unrealistic in practice (and not very useful) to assume there exists asingle dataset of interest to all users. Therefore, we designed the approach to be able to learn fromvisualizations preferred by other users on completely different datasets. This is done by leveragingthe similarity between the attributes (used in the visualizations) across completely different datasets(by first embedding the attributes from every dataset to a shared fixed dimensional space) as well as

Table 1. Summary of notation. Matrices are bold upright roman letters; vectors are bold lowercase letters. 𝒟 user log data 𝒟 = {( 𝒳 𝑖 , V 𝑖 ) } 𝑛𝑖 = consisting of a set of datasets 𝒳 𝑖 for every user 𝑖 ∈ [ 𝑛 ] and thesets V 𝑖 of relevant visualizations for each of those datasets. 𝒳 𝑖 set of datasets (data matrices) of interest to user 𝑖 where 𝒳 𝑖 = { X 𝑖 , . . . , X 𝑖𝑗 , . . . } X 𝑖𝑗 the 𝑗 th dataset (data matrix) of interest to user 𝑖 . V 𝑖 sets of visualizations relevant to user 𝑖 where V 𝑖 = { 𝒱 𝑖 , . . . , 𝒱 𝑖𝑗 , . . . } 𝒱 𝑖𝑗 set of visualizations relevant (generated) by user 𝑖 for dataset 𝑗 ( X 𝑖𝑗 ∈ 𝒳 𝑖 ) where 𝒱 𝑖𝑗 = { . . . , V , }V = ( X ( 𝑘 ) ,𝐶 ) a visualization V consisting of the subset of attributes X ( 𝑘 ) from some dataset X and the visual-configuration (design choices) 𝒞 set of visual-configurations where C ∈ 𝒞 represents the visualization design choices for a singlevisualization V such as the chart-type, x-axis, y-axis, color, and so on. X 𝑖𝑗 space of attribute combinations/subsets X 𝑖𝑗 = { X ( ) 𝑖𝑗 , . . . , X ( 𝑘 ) 𝑖𝑗 , . . . } of dataset X 𝑖𝑗 𝑛 number of users 𝑚 number of attributes (columns, variables) across all datasets, 𝑚 = (cid:205) 𝑖 𝑚 𝑖 where 𝑚 𝑖 = number ofattributes in the 𝑖 -th dataset 𝑣 number of relevant (user-generated) visualizations across all users and datasets ℎ number of visualization configurations 𝑘 dimensionality of the shared attribute feature space , i.e. , number of attribute features 𝑑 shared latent embedding dimensionality 𝑡 number of types of implicit/explicit user feedback, i.e. , attribute and visualization click, like, add-to-dashboard, among others x a attribute (column) vector from an arbitrary user uploaded dataset | x | cardinality of x , i.e. , number of unique values in x nnz ( x ) number of nonzeros in a vector x len ( x ) length of a vector xA user by attribute preference matrix C user by visualization configuration matrix D attribute preference by visual-configuration matrix M attribute by meta-feature matrix U shared user embedding matrix V shared attribute embedding matrix Z shared visualization configuration embedding matrix Y meta-feature embedding matrix for the attributes across all datasets :10 X. Qian et al. the similarity between the visual-configurations of the relevant visualizations, despite them usingcompletely different datasets. More formally, given any two users 𝑖, 𝑖 ′ ∈ [ 𝑛 ] along with one of theirrelevant visualizations, V 𝑖 𝑗𝑘 = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) ∈ 𝒱 𝑖 𝑗 and V 𝑖 ′ 𝑗 ′ 𝑘 ′ = ( X ( 𝑘 ′ ) 𝑖 ′ 𝑗 ′ , C 𝑖 ′ 𝑗 ′ 𝑘 ′ ) ∈ 𝒱 𝑖 ′ 𝑗 ′ , then since weknow that the datasets used in these visualizations are completely different, we instead can leveragethis across-dataset training information if they use similar attributes, where across-dataset similarityis measured by first mapping each attribute used in the visualization to a shared 𝐾 -dimensionalmeta-feature space, where we can then measure the similarity between each of the attributesused in the visualizations generated by different users. Hence, 𝑠 ⟨ Ψ ( X ( 𝑘 ) 𝑖 𝑗 ) , Ψ ( X ( 𝑘 ′ ) 𝑖 ′ 𝑗 ′ )⟩ > − 𝜖 where X 𝑖 ′ 𝑗 ′ ∉ 𝒳 𝑖 and X 𝑖 𝑗 ∉ 𝒳 𝑖 ′ . Intuitively, this implies that even though the visualizations are generatedusing different data, they visualize data that is similar with respect to its overall characteristicsand patterns. By construction, visualizations V 𝑖 𝑗𝑘 and V 𝑖 ′ 𝑗 ′ 𝑘 ′ from two different users 𝑖, 𝑖 ′ ∈ [ 𝑛 ] anddatasets X 𝑖 𝑗 ≠ X 𝑖 ′ 𝑗 ′ may use the same visual-configuration (set of design choices), C 𝑖 𝑗𝑘 = C 𝑖 ′ 𝑗 ′ 𝑘 ′ ∈ 𝒞 ,since we defined the notion of visual-configurations to be data-independent, and thus, even thoughtwo visualizations may visualize data attributes from completely different datasets, they can stillshare the same visual-configuration (design choices). Therefore, as we will see later, we are able tolearn from other users with visualizations that use attributes from completely different datasets. After learning the personalized visualization recommendation model ℳ 𝑖 for an individual user 𝑖 ∈ [ 𝑛 ] (Eq. 13), we can then use ℳ 𝑖 to score and recommend the top most relevant visualizationsfor user 𝑖 from any arbitrary dataset X . There are three possible cases that are naturally supportedby the learned model ℳ 𝑖 for recommending visualizations specifically of interest to user 𝑖 based ontheir past interactions (visualizations the user viewed/clicked or more generally interacted with):(1) The dataset X used for recommending personalized visualizations to user 𝑖 via ℳ 𝑖 can be anew previously unseen dataset of interest X ∉ { 𝒳 , 𝒳 , . . . , 𝒳 𝑛 } (2) The dataset X is not a previous dataset of interest to user 𝑖 , but has been used previously byone or more other users X ∈ { 𝒳 , . . . , 𝒳 𝑛 } \ 𝒳 𝑖 (3) The dataset X ∈ 𝒳 𝑖 is a previous dataset of interest to user 𝑖 A fundamental property of the personalized visualization recommendation problem is that the uservisualization scores for an arbitrary visualization V (that visualizes data from an arbitrary dataset X ) are different depending on the individual user and their historical preferences and interests.More formally, given users 𝑖, 𝑖 ′ ∈ [ 𝑛 ] and a visualization V from a new unseen dataset X test , weobtain personalized visualization scores for user 𝑖 and 𝑖 ′ as ℳ 𝑖 (V) and ℳ 𝑖 ′ (V) , respectively.While existing rule-based [Moritz et al. 2018; Wongsuphasawat et al. 2015, 2017] or ML-basedsystems [Qian et al. 2020] score the visualization V the same, no matter the actual user of thesystem (hence, are agnostic to the actual user and their interests, past interactions, and intent), ourwork instead focuses on learning individual personalized visualization recommendation modelsfor every user 𝑖 ∈ [ 𝑛 ] such that the personalized score ℳ 𝑖 (V) of visualization V for user 𝑖 isalmost surely different from the score ℳ 𝑖 ′ (V) given by the personalized model of another user 𝑖 ′ , ℳ 𝑖 (V) ≠ ℳ 𝑖 ′ (V) . We can state this more generally for all pairs of users 𝑖, 𝑖 ′ ∈ [ 𝑛 ] with respectto a single arbitrary visualization V , ℳ 𝑖 (V) ≠ ℳ 𝑖 ′ (V) , ∀ 𝑖, 𝑖 ′ = , . . . , 𝑛 s.t. 𝑖 < 𝑖 ′ (14)Hence, given an arbitrary visualization V , the personalized scores ℳ 𝑖 (V) and ℳ ′ 𝑖 (V) for anytwo distinct users 𝑖 and 𝑖 ′ are not equal with high probability. This is due to the fact that thepersonalized visualization recommendation models ℳ , ℳ , . . . , ℳ 𝑛 capture each of the 𝑛 usersindividual data preferences, design/visual preferences, and overall visualization preferences. ersonalized Visualization Recommendation 1:11 Definition 8 (Personalized Visualization Scoring).

Given the personalized visualizationrecommendation model ℳ 𝑖 for user 𝑖 and a dataset X test of interest to user 𝑖 , we can obtain thepersonalized scores for user 𝑖 of every possible visualization that can be generated as, ℳ 𝑖 : 𝒳 test × 𝒞 → R (15) where 𝒳 test = { . . . , X ( 𝑘 ) test , . . . } is the space of attribute subsets from X test and 𝒞 is the space of visual-ization configurations. Hence, given an arbitrary visualization V , the learned model ℳ 𝑖 outputs apersonalized score for user 𝑖 describing the effectiveness or importance of the visualization with respectto that individual user. Definition 9 (Personalized Visualization Ranking).

Given the set of generated visualiza-tions V test = { 𝒱 , 𝒱 , . . . , 𝒱 𝑄 } where 𝑄 = | V test | , we derive a personalized ranking of the visualizations V test from X test for user 𝑖 as follows: 𝜌 𝑖 (cid:0) {V , V , . . . , V Q } (cid:1) = arg sort V 𝑡 ∈ V test ℳ 𝑖 (V 𝑡 ) (16) where for any two visualizations V 𝑡 and V 𝑡 ′ in the personalized ranking 𝜌 𝑖 (cid:0) {V , V , . . . , V |Q | } (cid:1) ofvisualizations for the individual user 𝑖 (from dataset X test ) such that 𝑡 < 𝑡 ′ , then ℳ 𝑖 (V 𝑡 ) ≥ ℳ 𝑖 (V 𝑡 ′ ) holds by definition. In this section, we present the framework for solving the personalized visualization recommendationproblem from Section 2. In Section 3.1, we first describe the meta-feature learning approach formapping user datasets to a shared universal meta-feature space where relationships between thecorpus of tens of thousands of datasets can be automatically inferred and used for learning individualpersonalized models for each user. Then Section 3.2 introduces a graph model that captures thedata preferences of users while Section 3.3 proposes graph models that naturally encode the visualpreferences of users. The personalized visualization recommendation models learned from theproposed graph representations are described in Section 3.4, while the visualization scoring andrecommendation techniques are presented in Section 3.5.

To learn from user datasets of different sizes, types, and characteristics, we first embed the attributes(columns) of each dataset X ∈ X ∪ X ∪ · · · ∪ X 𝑛 (from any user) in a shared 𝐾 -dimensional meta-feature space. This also enables the personalized visualization recommendation model to learnfrom users with similar data preferences. Recall that each user 𝑖 ∈ [ 𝑛 ] is associated with a set ofdatasets X 𝑖 = { X 𝑖 , X 𝑖 , . . . } .Claim 3.1. Let 𝒳 = (cid:208) 𝑛𝑖 = X 𝑖 denote the set of all datasets. Then 𝑛 ∑︁ 𝑖 = |X 𝑖 | ≥ | 𝒳 | (17)Hence, if (cid:205) 𝑛𝑖 = |X 𝑖 | = | 𝒳 | , then this implies that all users have completely different datasets (theredoes not exist any two users 𝑖, 𝑗 ∈ [ 𝑛 ] that have a dataset in common). Otherwise, if there existstwo users that have at least one dataset in common with one another, then (cid:205) 𝑛𝑖 = |X 𝑖 | > | 𝒳 | .In our personalized visualization recommendation problem (Sec. 2), it is possible (and in manycases likely) that users are interested in completely different datasets. In the worst case, everyuser has a completely disjoint set of datasets, and thus, the implicit and/or explicit user feedbackregarding the attributes of interest to the users is also completely disjoint. In such a case, the :12 X. Qian et al. Table 2. Meta-feature learning framework overview

Framework Components Examples

1. Data representations G x , p , 𝑔 ( x ) , ℓ 𝑏 ( x ) log-binning, ...

2. Partitioning functions Π Clustering, binning, quartiles, ...

3. Meta-feature functions 𝜓 Statistical, information theoretic, ...

4. Meta-embedding of meta-features arg min H , 𝚺 , Q D L (cid:0) M ∥ H 𝚺 Q ⊤ (cid:1) , then (cid:98) q = 𝚺 − H ⊤ (cid:98) m question then becomes how can we leverage the feedback from users like this, to better recommendattributes from different datasets that may be of interest to a new and/or previous user? To do this,we need a general method that can derive a fixed-size embedding Ψ ( x ) ∈ R 𝐾 of an attribute x fromany arbitrary dataset X such that the 𝐾 -dimensional embedding Ψ ( x ) captures the important datacharacteristics and statistical properties of x , independent of the dataset and size of x . Afterwards,given two attributes x and y from different datasets ( i.e. , X and Y ) and users, we can derive thesimilarity between x and y . Suppose there is implicit/explicit user feedback regarding an attribute x , then given another arbitrary user 𝑖 interested in a new dataset Y (without any feedback on theattributes in Y ), then we can derive the similarity between x and y , and if y is similar to an attribute x that was preferred by some user(s), then we can assign the attribute y a higher probability (weight,score), despite that it doesn’t yet have any user feedback. Therefore, as discussed above, it is clearthat this idea of transferring user feedback about attributes across different datasets is extremelypowerful and fundamentally important for personalized visualization recommendation (especiallywhen there is only limited sparse feedback available). Moreover, the proposed idea above is alsoimportant when there is no feedback about an attribute in some dataset, or a completely new datasetof interest by a user. This enables us to learn better personalized visualization recommendationmodels for individual users while requiring significantly less feedback. Property 3.

Two attributes x and y are similar iff 𝑠 ⟨ Ψ ( x ) , Ψ ( y )⟩ > − 𝜖. (18)where 𝑠 ⟨· , ·⟩ is the similarity function. Notice that since almost surely | x | ≠ | y | (different sizes),then the similarity of x and y cannot be computed directly. Therefore, we embed x and y intothe same 𝐾 -dimensional meta-feature space where there similarity can be computed directly as 𝑠 ⟨ Ψ ( x ) , Ψ ( y )⟩ .Attributes from different datasets are naturally of different sizes, types, and even from differentdomains. Therefore as shown above, there is no way to compute similarity between them directly.Instead, we propose to map each attribute from any arbitrary dataset into a shared 𝐾 -dimensionalspace using meta-feature functions. After every attribute is mapped into this 𝐾 -dimensional meta-feature space, we can then compare their similarity directly. In this work, we propose a meta-featurelearning framework with four main components as shown in Table 2. Many of the frameworkcomponents use the meta-feature functions denoted as 𝜓 . A meta-feature function is a function thatmaps an arbitrary vector to a value that captures a specific characteristic of the vector of values. Inthis work, we leverage a large class of meta-feature functions formally defined in Table 3. However,the framework is flexible and can leverage any arbitrary collection of meta-feature functions.Notably, mapping every attribute from any dataset into a low-dimensional meta-feature spaceenables the model to capture and learn from the similarity between user preferred attributes incompletely different datasets.Let x denote an attribute (column vector) from any arbitrary user dataset. Then we may apply thecollection of meta-features 𝜓 from Table 3 directly to x to obtain a low-dimensional representation ersonalized Visualization Recommendation 1:13 of x as 𝜓 ( x ) . In addition, we can also apply the meta-feature functions 𝜓 to various representationsand transformation of x . For instance, we can first derive the probability distribution 𝑝 ( x ) of x such that 𝑝 ( x ) ⊤ e =

1, and then use the meta-feature functions 𝜓 over 𝑝 ( x ) to characterize thisrepresentation of x . We can also use the meta-feature functions to characterize other importantrepresentations and transformations of the attribute vector x such as different scale-invariant anddimensionless representations of the data using different normalization functions 𝑔 ℎ (·) over theattribute (column) vector x , and from each of these representations, we can apply the above meta-feature functions, e.g. , 𝑔 ℎ ( x ) = x − min ( x ) max ( x )− min ( x ) , then 𝜓 ( 𝑔 ℎ ( x )) . More generally, let G = { 𝑔 , 𝑔 , . . . , 𝑔 ℓ } denote a set of data representation and transformation functions that can be applied over anattribute vector x from any arbitrary user dataset. We first compute the meta-feature functions 𝜓 ( e.g. , from Table 3) over the ℓ different representations of the attribute vector x given by thefunctions G = { 𝑔 , 𝑔 , . . . , 𝑔 ℓ } as follows: 𝜓 ( 𝑔 ( x )) ,𝜓 ( 𝑔 ( x )) , . . . ,𝜓 ( 𝑔 ℓ ( x )) (19)Note that if 𝑔 ∈ G is the identity function, then 𝜓 ( 𝑔 ( x )) = 𝜓 ( x ) . In all cases, the meta-featurefunction 𝜓 maps a vector of arbitrary size to a fixed size lower-dimensional vector.For each of the different representation/transformation functions G = { 𝑔 , . . . , 𝑔 ℓ } of the attributevector x , we use a partitioning function Π to group the different values into 𝑘 different subsets ( i.e. ,partitions, clusters, bins). Then we apply the meta-feature functions 𝜓 to each of the 𝑘 differentgroups as follows: 𝜓 ( Π ( 𝑔 ( x ))) , . . . ,𝜓 ( Π 𝑘 ( 𝑔 ( x ))) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑔 ( x ) , . . . ,𝜓 ( Π ( 𝑔 ℓ ( x ))) , . . . ,𝜓 ( Π 𝑘 ( 𝑔 ℓ ( x ))) (cid:124) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:123)(cid:122) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:125) 𝑔 ℓ ( x ) (20)where Π 𝑘 denotes the 𝑘 th partition of values from the partitioning function Π . Note that to ensureevery attribute is mapped to the same 𝐾 -dimensional meta-feature space, we only need to fix thenumber of partitions 𝑘 . In Eq. 20, we show only a single partitioning function Π , however, multiplepartitioning functions are used in this work and each is applied in a similar fashion as Eq. 20. Allthe meta-features derived from Eq. 19 and Eq. 20 are then concatenated into a single vector ofmeta-features describing the characteristics of the attribute x . More formally, the meta-featurefunction Ψ : x → R 𝐾 that combines the different components from the framework (in Table 2) isdefined as Ψ ( x ) = (cid:2) 𝜓 ( 𝑔 ( x )) · · · 𝜓 ( 𝑔 ℓ ( x )) · · · 𝜓 ( Π ( 𝑔 ( x ))) · · · 𝜓 ( Π 𝑘 ( 𝑔 ( x ))) (21) · · · 𝜓 ( Π ( 𝑔 ℓ ( x ))) · · · 𝜓 ( Π 𝑘 ( 𝑔 ℓ ( x ))) (cid:3) The resulting Ψ ( x ) is a 𝐾 -dimensional meta-feature vector for attribute x . Our approach is agnosticto the precise meta-feature functions used, and is flexible for use with any alternative set ofmeta-feature functions (Table 3).Let 𝒳 = ∪ 𝑛𝑖 = X 𝑖 denote the set of dataset matrices across all 𝑛 users. Given an arbitrary datasetmatrix X ∈ 𝒳 (which can be shared among multiple users), let Ψ ( X ) ∈ R 𝐾 ×| X | be the resultingmeta-feature matrix obtained by applying Ψ independently to each of the | X | attributes (columns)of X . Then, we can derive the overall meta-feature matrix M as M = (cid:202) X ∈ 𝒳 Ψ ( X ) (22)where (cid:201) is the concatenation operator, i.e. , Ψ ( x ) ⊕ Ψ ( y ) = [ Ψ ( x ) Ψ ( y )] ∈ R 𝐾 × . Note that Eq. 22 isnot equivalent to (cid:201) 𝑛𝑖 = (cid:201) X ∈X 𝑖 Ψ ( X ) since any two users 𝑖, 𝑗 ∈ [ 𝑛 ] can share one or more datasets.With slight abuse of notation, let 𝑑 = | 𝒳 | and 𝒳 = { X , . . . , X 𝑑 } , then M = Ψ ({ X , , . . . , X 𝑑 }) where M is a 𝐾 × (| X | + · · · + | X 𝑑 |) matrix. :14 X. Qian et al. Data Attributes D a t a A tt r i b u t e s Attribute Meta-Features A tt r i b u t e M e t a - F e a t u r e s Fig. 1. Similarity of attributes across different datasets using the attribute embeddings in the universalmeta-feature space. (a) uses the first 1000 attributes across different datasets and takes the cosine similaritybetween each pair of attributes with respect to their fixed 𝑘 -dimensional meta-feature vectors. (b) shows thecosine similarity between the attribute meta-features used to characterize the different attributes. See text fordiscussion. In Figure 1, we investigate the similarity of attributes across different datasets in the personalizedvisualization corpus (Section 5), and observe two important findings. First, Figure 1(a) indicatesthat attributes across different datasets may be similar to one another and the latent relationshipsbetween the attributes can benefit learning personalized visualization recommendation models,especially for users with very few or even no visualization feedback. Second, the meta-featuresused to characterize attributes from any arbitrary dataset are diverse and fundamentally differentfrom one another as shown in Figure 1(b). This finding is important and validates the proposedmeta-feature learning framework since the meta-features must be able to capture the fundamentalpatterns and characteristics for a dataset from any arbitrary domain.

We can derive an embedding using the current meta-feature matrix M . Note this meta-feature matrix may contain all meta-features across all previousdatasets or simply the meta-features of a single dataset. However, the more datasets, the betterthe meta-embedding of the meta-feature will reveal the important latent structures between themeta-features. We learn the latent structure in the meta-feature matrix M by solving arg min H , 𝚺 , Q D L (cid:0) M ∥ H 𝚺 Q ⊤ (cid:1) (23)Given meta-features for a new attribute (cid:98) m in another arbitrary unseen dataset, we use the latentlow-rank meta-embedding matrices to map the meta-feature vector (cid:98) m ∈ R 𝑘 into the low-rankmeta-embedding space as (cid:98) q = 𝚺 − H ⊤ (cid:98) m (24)Hence, the meta-feature vector (cid:98) m of a new previously unseen attribute is mapped into the samemeta-embedding space (cid:98) q . The resulting meta-embedding of the meta-features of the new attributecan then be concatenated onto the meta-feature vector. This has several important advantages.First, using the proposed meta-feature learning framework shown in Table 2 results in hundreds or Assume w.l.o.g. that columns of M and the meta-features of a new attribute (cid:98) m are normalized to length 1. ersonalized Visualization Recommendation 1:15 Table 3. Summary of attribute meta-feature functions. Let x denote an arbitrary attribute (variable, column)vector and 𝜋 ( x ) is the sorted vector. Function Name Equation Rationale

Num. instances | x | Speed, ScalabilityNum. missing values 𝑠 Imputation effectsFrac. of missing values | x |− 𝑠 / | x | Imputation effectsNum. nonzeros nnz ( x ) Imputation effectsNum. unique values card ( x ) Imputation effectsDensity nnz ( x ) / | x | Imputation effects 𝑄 , 𝑄 median of the | x | / − IQR 𝑄 − 𝑄 − Outlier LB 𝛼 ∈ { . , } (cid:205) 𝑖 I ( 𝑥 𝑖 < 𝑄 − 𝛼𝐼𝑄𝑅 ) Data noisinessOutlier UB 𝛼 ∈ { . , } (cid:205) 𝑖 I ( 𝑥 𝑖 > 𝑄 + 𝛼𝐼𝑄𝑅 ) Data noisinessTotal outliers 𝛼 ∈ { . , } (cid:205) 𝑖 I ( 𝑥 𝑖 < 𝑄 − 𝛼𝐼𝑄𝑅 ) + (cid:205) 𝑖 I ( 𝑥 𝑖 > 𝑄 + 𝛼𝐼𝑄𝑅 ) Data noisiness( 𝛼 std) outliers 𝛼 ∈ { , } 𝜇 x ± 𝛼𝜎 x Data noisinessSpearman ( 𝜌 , p-val) spearman ( x , 𝜋 ( x )) SequentialKendall ( 𝜏 , p-val) kendall ( x , 𝜋 ( x )) SequentialPearson ( 𝑟 , p-val) pearson ( x , 𝜋 ( x )) SequentialMin, max min ( x ) , max ( x ) − Range max ( x ) − min ( x ) Attribute normalityMedian med ( x ) Attribute normalityGeometric Mean | x | − (cid:206) 𝑖 𝑥 𝑖 Attribute normalityHarmonic Mean | x | / (cid:205) 𝑖 𝑥 𝑖 Attribute normalityMean, Stdev, Variance 𝜇 x , 𝜎 x , 𝜎 x Attribute normalitySkewness E ( x − 𝜇 x ) / 𝜎 x Attribute normalityKurtosis E ( x − 𝜇 x ) / 𝜎 x Attribute normalityHyperSkewness E ( x − 𝜇 x ) / 𝜎 x Attribute normalityMoments [6-10] − Attribute normalityk-statistic [3-4] − Attribute normalityQuartile Dispersion Coeff. 𝑄 − 𝑄 𝑄 + 𝑄 DispersionMedian Absolute Deviation med (| x − med ( x )|) DispersionAvg. Absolute Deviation | x | e 𝑇 | x − 𝜇 x | DispersionCoeff. of Variation 𝜎 x / 𝜇 x DispersionEfficiency ratio 𝜎 x / 𝜇 x DispersionVariance-to-mean ratio 𝜎 x / 𝜇 x DispersionSignal-to-noise ratio (SNR) 𝜇 x / 𝜎 x Noisiness of dataEntropy 𝐻 ( x ) = − (cid:205) 𝑖 𝑥 𝑖 log 𝑥 𝑖 Attribute InformativenessNorm. entropy 𝐻 ( x ) / log | x | Attribute InformativenessGini coefficient − Attribute InformativenessQuartile max gap max ( 𝑄 𝑖 + − 𝑄 𝑖 ) DispersionCentroid max gap max 𝑖 𝑗 | 𝑐 𝑖 − 𝑐 𝑗 | DispersionHistogram prob. dist. p ℎ = hh 𝑇 e (with fixed thousands of meta-features for a single attribute. Many of these meta-features may not be importantfor a specific attribute, while a few meta-features may be crucial in describing the attribute and itsdata characteristics. Therefore, the meta-embedding of the meta-features can be viewed as a noisereduction step that essentially removes redundant or noisy signals from the data while preserving :16 X. Qian et al. the most important signals that describe the fundamental direction and characteristics of the data.Second, the meta-embedding of the meta-features reveals the latent structure and relationshipsin the meta-features. This step can also be viewed as a type of landmark feature since we solve alearning problem to find a low-rank approximation of M such that M ≈ H 𝚺 Q ⊤ .However, we include it as a different component of the meta-feature learning framework inTable 2 since instead of concatenating the meta-embedding of the meta-features for a attribute, wecan also use it directly by replacing it with the meta-feature vector. This is especially importantwhen there is a large number of datasets ( e.g. , more than 100K datasets with millions of attributesin total) for learning. For instance, if there are 2.3M attributes (see 100K dataset in Table 4), andeach attribute is encoded with a dense 𝐾 = M has1006 × , ,

000 values that need to be stored, which use 18.5GB space (assuming 8 bytes pervalue). However, if we use the meta-embedding of the meta-features with 𝐾 =

10, then M takesabout 200MB (0.18GB) of space. Given users 𝑖 and 𝑗 that provide feedback on the attributes of interest from two completely differentdatasets, how can we leverage the user feedback (data preferences) despite it being across differentdatasets without any shared attributes? To address this important problem, we propose a novelrepresentation and model that naturally enables the transfer of user-level data preferences acrossdifferent datasets to improve predictive performance, recommendations, and reduce data sparsity.The across dataset transfer learning of user-level data preferences becomes possible due to theproposed representation and model for personalized visualization recommendation.We now introduce the novel user-level data preference graph model for personalized visualiza-tion recommendation that naturally enables across-dataset and across-user transfer learning ofpreferences. This model encodes users, their interactions with attributes (columns/variables fromany arbitrary dataset) and the meta-features of the attributes. This new representation enables usto learn from user-level data preferences across different datasets and users, and therefore veryimportant for personalized visualization recommendation systems. In particular, we first derive thefollowing user-by-attribute preference matrix A as follows: A = (cid:2) A (cid:3) 𝑖 𝑗 = 𝑖 clicked (a visualization with) attribute 𝑗 (25)In terms of the implicit or explicit user “action” encoded by 𝐴 𝑖 𝑗 , it could be an implicit user actionsuch as when a user clicks or hovers-over a specific attribute 𝑗 or when a user clicks or hovers-overa visualization that uses attribute 𝑗 in it. Similarly, 𝐴 𝑖 𝑗 can encode an explicit user action/feedbacksuch as the attribute 𝑗 that a user explicitly liked (independent of a visualization), or more generally,the attribute 𝑗 used in a visualization that a user explicitly liked, or added-to-their dashboard, andso on. In other words, there are two different types of explicit and implicit user feedback aboutattributes, notably, user feedback regarding a visualization that used an attribute 𝑗 (whether theuser action is a click, hover, added-to-dashboard, etc), or more directly, whether a user liked orclicked on an attribute in the dataset directly via some UI.Given A defined in Eq. 25, we are able to learn from two or more users that have at least oneattribute preference in common. More precisely, A ⊤ 𝑖, : A 𝑗, : > 𝑖 and 𝑗 , whichimplies two users 𝑖 and 𝑗 share a dataset of interest, and have preferred at least one of the sameattributes in that dataset. Unfortunately, finding two users that satisfy the above constraint isoften unlikely. Therefore, we need to add another representation to A that creates meaningfulconnections between attributes in different datasets based on their similarity. In particular, we dothis by leveraging the meta-feature matrix M from Section 3.1 derives by mapping every attribute ersonalized Visualization Recommendation 1:17 in a user-specific dataset to a k-dimensional meta-feature vector. This defines a universal meta-feature space that is shared among the attributes in any arbitrary dataset, and therefore allowingthe learning of connections between users and their preferred attributes in completely differentdatasets. This new component is very important, since without it, we have no way to learn fromother users (and across different datasets), since each user has its own datasets, and thus has theirown set of visualizations (where each visualization consists of a set of design choices and datachoices) that are not shared by any other users. Fig. 2. Overview of the proposed graph model for personalized visualization recommendation. Links betweenmeta-features and data attributes represent the meta-feature matrix M from Section 3.1 whereas links betweenusers and their preferred data attributes represent A (Section 3.2). Both of these capture the user-level datapreferences across different datasets. Links between users and their visualization configurations represent A and capture the user-level visual preferences. Finally, links between visualization configurations and dataattributes represent D . Note that all links are weighted; and the data attribute/column nodes of a specificdataset are grouped together. User-level Visual Preferences

Across Different Datasets

Visualizations naturally consist of data and visual design choices. The dependence of data in avisualization means that visualizations generated for one dataset will be completely different fromthe visualizations generated by any other dataset. This is problematic if we want to develop apersonalized visualization recommendation system that can learn from the visual preferences ofusers despite that the users may not have preferred any visualizations from the same datasets. Thisis important since visualizations are fundamentally tied to the dataset, and each user may havetheir own set of datasets that are not shared by any other user. Moreover, even if two users had adataset in common, the probability that the users prefer the same visualization is very small (almostsurely zero) due to the exponential space of visualizations that may arise from a single dataset.To overcome these issues, we introduce a dataset independent notion of a visualization calleda visualization configuration. Using this notion, we propose a novel graph representation thatenables us to learn from the visual preferences of users despite that they may not have preferred anyvisualizations from the same datasets. In particular, to learn from visualizations across differentdatasets and users, we introduce a dataset independent notion called a visualization configurationthat removes the dataset dependencies of visualizations, enabling us to capture the general user-levelvisual preferences of users, independent of the dataset of interest. A visualization configuration is :18 X. Qian et al. an abstraction of a visualization where instead of mapping specific data attributes to specific designchoices of the visualization ( e.g. , x, y, color, etc.), we replace them with their general type ( e.g. ,numerical, categorical, temporal, ...) or other general property or set of properties that generalizeacross the different datasets. Most importantly, it is by replacing the data-specific design choiceswith their general type or set of general properties that enables us to capture and learn from thesevisualization-configurations. More formally,

Definition 10 (Visual Configuration).

Given a visualization V consisting of a set of designchoices and data (attributes) associated to a subset of the design choices. For every design choicesuch as chart-type, there is a set of possible options. Other design choices such as color can also beassociated to a set of options, e.g. , static color definitions, or color map for specific data attributes. Let T : x → P be a function that maps an attribute x of some arbitrary dataset X ∈ 𝒳 to a property 𝑃 that generalizes across any arbitrary dataset, and therefore, is independent of the specific dataset.Hence, given attributes x and y from two different datasets, then it is possible that T ( x ) = 𝑃 and T ( y ) = 𝑃 . A visualization configuration is defined as an abstraction of a visualization where everydesign choice of a visualization that is bound to a data attribute is replaced with a general property ofthe attribute T ( x ) = 𝑃 . Claim 3.2.

There exists x and y from different datasets such that T ( x ) = 𝑃 and T ( y ) = 𝑃 hold. The size of the space of visualization configurations is large since visualization configurations comefrom all possible combinations of design choices and their values such as, chart-type: bar, scatter, ... x-type: quantitative, nominal, ordinal, temporal, ..., none y-type: quantitative, nominal, ordinal, temporal, ..., none color: red, green, blue, ... size: x-aggregate: sum, mean, bin, ..., none y-aggregate: sum, mean, bin, ..., none ...

A visualization configuration and the attributes selected is everything necessary to generate a visual-ization. In Figure 3, we provide a toy example showing the process of extracting a data-independentvisual-configuration from a visualization. Using the notion of a visualization configuration (Defini-tion 10), we can now introduce a model that captures the visualization preferences of users whileensuring that the visual preferences are not tied to specific datasets. In particular, we define thevisual preference matrix C as follows: C = (cid:2) C (cid:3) 𝑖 𝑗 = C (or other similarvisual preference matrices) include 𝐶 𝑖 𝑗 = 𝑖 performed action ∈ { clicked, hovered,liked, added-to-dashboard } visualization configuration 𝑗 . From our proposed graph model shownin Eq. 26, we can directly learn from user-level visual preferences across different datasets. Thisnovel user-level visual preference graph model for visualization recommendation encodes users andtheir visual-configurations. Since each visual-configuration node represents a set of design choicesthat are by definition not tied to a user-specific dataset, then the model can use this user-levelvisual graph to infer and make connections between other similar visual-configurations likely tobe of interest to that user. This new graph model is critical since it allows the learning componentto learn from user-level visual preferences (which are visual-configurations) across the different ersonalized Visualization Recommendation 1:19 Fig. 3. Visualization to Data Independent Visual Configuration. A visualization consists of the data alongwith a set of design choices, and thus are dataset dependent. In this work, we propose the notion of a visualconfiguration that removes the data dependency of a visualization while capturing the visual design choices.Notably, visual configurations naturally generalize across datasets, since they are independent of the dataset,and thus unlike visualizations, a visual configuration can be shared among users that use entirely differentdatasets. The visualization shown in the above toy example uses only two data attributes from the dataset.Note actual visual configurations have many other design choices which are not shown in the toy exampleabove for simplicity. datasets and users. Without this novel component, there would be no way to learn from other usersvisual preferences (sets of design choices).The novel notion of a visualization configuration that removes the dataset dependencies of avisualization enabling us to model and learn from the visual preferences of users across differentdatasets. We introduce the notion of a visualization-configuration, which is an abstraction of avisualization. For instance, suppose we have a json encoding of the actual visualization, that is thedesign choices + the actual attributes and their data used in the visualization (hence, using thisjson, we can create the visualization precisely). Recall that this is not very useful for personalizedvisualization recommendation since the visualization is clearly tied to the specific dataset usedby a single user. Hence, if we used visualizations directly, then the optimization method usedto optimize an arbitrary objective function to obtain the embeddings for inference would notbe able to use other user preferences, since they would also be for visualizations tied to otherdatasets. To overcome this issue, we propose the novel notion of a visualization-configuration thatremoves the data-dependency. In particular, given a visualization which includes the design choices+ data choices (e.g., data used for the x, y, color attributes), we derive a visualization-configurationfrom it by replacing the data (attributes) and data attribute names by general properties thatare dataset-independent. For instance, in this work, we have used the type of the attribute (e.g.,categorical, real-valued, etc.), but we can also use any other general property of the data as well. Mostimportantly, this new abstraction enables us to learn from users and their visual preferences (designchoices), despite that these visual preferences are for visualizations generated for a completelydifferent dataset. This is because we carefully designed the notion of a visualization-configurationto generalize across datasets. In other words, the proposed notion of visualization-configuration areindependent of the dataset at hand, and therefore can be shared among users. Notice that traditionalrecommender systems used in movie or product recommendation are comparatively simple, sincethese systems assume a single universal dataset (set of movies, set of items/products) that all usersshare and have feedback about. However, none of these simple assumptions hold in the case of :20 X. Qian et al. visualization recommendation, and therefore we had to develop and propose these new notionsand models for learning.Notice the matrix C encodes the data-independent visual preferences of each user, which is veryimportant. However, this representation does not capture how the visual configurations map tothe actual data preferences of the users (attributes and general meta-features that characterizethe attributes). Therefore, we also introduce another representation to encode these importantassociations. In particular, we encode the attributes associated with each visual-configurations as, D = (cid:2) D (cid:3) 𝑘𝑡 = 𝑘 was used in visual-configuration 𝑡 clicked by some user (27)As an example, given a relevant visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , 𝐶 𝑡 ) ∈ 𝒱 𝑖 𝑗 of user 𝑖 ∈ [ 𝑛 ] for dataset X 𝑖 𝑗 with attributes X ( 𝑘 ) 𝑖 𝑗 = [ x 𝑝 x 𝑞 ] and visual-configuration 𝐶 𝑡 ∈ 𝒞 , we set 𝐷 𝑝𝑡 = 𝐷 𝑝𝑡 + 𝐷 𝑞𝑡 = 𝐷 𝑞𝑡 +

1. We repeat this for all relevant visualizations of each user. In Figure 2, we provide anoverview of the proposed personalized visualization recommendation graph model.

We first introduce the PVisRec model that uses the learned meta-feature matrix M from Section 3.1and all the graph representations proposed in Section 3.2 for capturing the shared data preferencesbetween users despite using completely different datasets along with the graph representationsfrom Section 3.3 that capture the visual preferences of users across all datasets in the corpus. Thenwe discuss two variants of PVisRec that are investigated later in Section 6. Given the sparse user by attribute adjacency matrix A ∈ R 𝑛 × 𝑚 , dense meta-featureby attribute matrix M ∈ R 𝑘 × 𝑚 , sparse user by visual-configuration adjacency matrix C ∈ R 𝑛 × ℎ , andsparse attribute by visual-configuration adjacency matrix D ∈ R 𝑚 × ℎ , the goal is to find the rank- 𝑑 embedding matrices U , V , Z , and Y that minimize the following objective function: 𝑓 ( U , V , Z , Y ) = ∥ A − UV ⊤ ∥ + ∥ M − YV ⊤ ∥ + ∥ C − UZ ⊤ ∥ + ∥ D − VZ ⊤ ∥ (28)where U ∈ R 𝑛 × 𝑑 , V ∈ R 𝑚 × 𝑑 , Z ∈ R ℎ × 𝑑 , Y ∈ R 𝑘 × 𝑑 are low-rank 𝑑 -dimensional embeddings ofthe users, attributes (across all datasets), visual-configurations, and meta-features. Further, theformulation above uses squared error, though other loss functions can also be used (e.g., Bregmandivergences) [Singh and Gordon 2008]. We can solve Eq. 28 by computing the gradient and thenusing a first-order optimization method [Schenker et al. 2021]. Afterwards, we have A ≈ A ′ = UV ⊤ = 𝑑 ∑︁ 𝑟 = u 𝑟 v ⊤ 𝑟 (29) M ≈ M ′ = YV ⊤ = 𝑑 ∑︁ 𝑟 = y 𝑟 v ⊤ 𝑟 (30) C ≈ C ′ = UZ ⊤ = 𝑑 ∑︁ 𝑟 = u 𝑟 z ⊤ 𝑟 (31) D ≈ D ′ = VZ ⊤ = 𝑑 ∑︁ 𝑟 = v 𝑟 z ⊤ 𝑟 (32)Solving Eq. 28 corresponds to the PVisRec model investigated later in Section 6. We also investigatea few different variants of the PVisRec model from Eq. 28 later in Section 6. In particular, themodel variants of PVisRec use only a subset of the graph representations { A , C , D } and/or densemeta-feature matrix M introduced previously in Section 3.1-3.3. ersonalized Visualization Recommendation 1:21 A , C , M only): Given the user by attribute matrix A ∈ R 𝑛 × 𝑚 , meta-feature by attributematrix M ∈ R 𝑘 × 𝑚 , and user by visual-configuration matrix C ∈ R 𝑛 × ℎ , the goal is to find the rank- 𝑑 embedding matrices U , V , Z , and Y that minimize the following objective function: 𝑓 ( U , V , Z , Y ) = ∥ A − UV ⊤ ∥ + ∥ M − YV ⊤ ∥ + ∥ C − UZ ⊤ ∥ (33) A , C , D only). Besides Eq. 33 that uses only A , M , and C , we also investigate anotherpersonalized visualization recommendation model that uses A , C , and D (without meta-features).More formally, given A , C , and D , then the problem is to learn low-dimensional rank- 𝑑 embeddingmatrices U , V , and Z that minimize the following: 𝑓 ( U , V , Z ) = ∥ A − UV ⊤ ∥ + ∥ C − UZ ⊤ ∥ + ∥ D − VZ ⊤ ∥ (34)In this work, we used an ALS-based optimizer to solve Eq. 28 and the simpler variants shownin Eq. 33 and Eq. 34. However, we can also leverage a variety of different optimization schemesincluding cyclic/block coordinate descent [Kim et al. 2014; Rossi and Zhou 2016], stochastic gradientdescent [Oh et al. 2015; Yun et al. 2014], among others [Balasubramaniam et al. 2020; Bouchardet al. 2013; Choi et al. 2019; Schenker et al. 2021; Singh and Gordon 2008]. We first discuss using the personalized visualization recommendation model for recommendingattributes to users as well as visual-configurations. Then we discuss the fundamentally morechallenging task of personalized visualization recommendation.

The ranking of attributes for user 𝑖 is induced by U 𝑖, : V ⊤ where U 𝑖, : is the embedding of user 𝑖 . Let 𝜋 ( U 𝑖, : V ⊤ ) denote the largest attribute weight foruser 𝑖 . Therefore, the top- 𝑘 attribute weights for user 𝑖 are denoted as: 𝜋 ( U 𝑖, : V ⊤ ) , 𝜋 ( U 𝑖, : V ⊤ ) , . . . , 𝜋 𝑘 ( U 𝑖, : V ⊤ ) The personalized ranking of the visual-configurations for user 𝑖 is inferred by U 𝑖, : Z ⊤ where U 𝑖, : is the embedding of user 𝑖 and Z is thematrix of visual-configuration embeddings. Hence, U 𝑖, : Z ⊤ ∈ R ℎ is an ℎ -dimensional vector ofweights indicating the likelihood/importance of each visual-configuration for that specific user 𝑖 . Let 𝜋 ( U 𝑖, : Z ⊤ ) denote the largest visual-configuration weight for user 𝑖 . Therefore, the top- 𝑘 visual-configuration weights for user 𝑖 is denoted as: 𝜋 ( U 𝑖, : Z ⊤ ) , 𝜋 ( U 𝑖, : Z ⊤ ) , . . . , 𝜋 𝑘 ( U 𝑖, : Z ⊤ ) We now focus on the most complex and chal-lenging problem of recommending complete visualizations personalized for a specific user 𝑖 ∈ [ 𝑛 ] .A recommended visualization for user 𝑖 ∈ [ 𝑛 ] consists of both the subset of attributes X ( 𝑘 ) fromsome dataset X and the design choices 𝐶 𝑡 (a visual-configuration) for those attributes. Given user 𝑖 along with an arbitrary visualization V = ( X ( 𝑘 ) , 𝐶 𝑡 ) generated from some dataset X of interest touser 𝑖 , we derive a personalized user-specific score for visualization V (for user 𝑖 ) as, (cid:98) 𝑦 (V) = U 𝑖, : Z ⊤ 𝑡, : (cid:214) x 𝑗 ∈ X ( 𝑘 ) U 𝑖, : V ⊤ 𝑗, : (35)where X ( 𝑘 ) is the subset of attributes from the users dataset X (hence, | X ( 𝑘 ) | ≤ | X | ) used in thevisualization V and 𝐶 𝑡 ∈ 𝒞 is the visual-configuration of the visualization V being scored foruser 𝑖 . Using Eq. 35, we can predict the personalized visualization score (cid:98) 𝑦 (V) for any arbitraryvisualization V (for any dataset) and user 𝑖 ∈ [ 𝑛 ] . For evaluation in Section 6.1, we use Eq. 35 toscore relevant and non-relevant visualizations for a specific user and dataset of interest. :22 X. Qian et al. We now introduce a deep neural network architecture for personalized visualization recommenda-tion. For this, we combine the previously proposed model with a deep multilayer neural networkcomponent to learn non-linear functions that capture complex dependencies and patterns betweenusers and their visualization preferences.

Given an arbitrary user 𝑖 and a visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , 𝐶 𝑡 ) to score fromsome new dataset of interest to that user, we first must decide on the input representation. In thiswork, we leverage the user personalized embeddings learned in Section 3.4 by concatenating theembedding of user 𝑖 , visual configuration 𝑡 , along with the embeddings for each attribute used inthe visualization. More formally, 𝜙 (V = ⟨ X ( 𝑘 ) 𝑖 𝑗 , 𝐶 𝑡 ⟩) =  u 𝑖 z 𝑡 v 𝑟 ... v 𝑟 𝑠  (36)where u 𝑖 is the embedding of user 𝑖 , z 𝑡 is the embedding of the visual-configuration 𝐶 𝑡 , and v 𝑟 , . . . , v 𝑟 𝑠 are the embeddings of the attributes used in the visualization being scored for user 𝑖 .This can be written as, 𝜙 (V = ⟨ X ( 𝑘 ) 𝑖 𝑗 , 𝐶 𝑡 ⟩) = (cid:2) U ⊤ e 𝑖 Z ⊤ e 𝑡 V ⊤ e 𝑟 · · · V ⊤ e 𝑟 𝑠 (cid:3) ⊤ (37)where e 𝑖 ∈ R 𝑛 (user 𝑖 ), e 𝑡 ∈ R ℎ (visual-configuration 𝐶 𝑡 ), and e 𝑟 ∈ R 𝑚 (attribute 𝑟 ) are the one-hotencodings of the user 𝑖 , visual-configuration 𝑡 , and attributes 𝑟 , ..., 𝑟 𝑠 used in the visualization. Notethat U ∈ R 𝑛 × 𝑑 , V ∈ R 𝑚 × 𝑑 , Z ∈ R ℎ × 𝑑 , Y ∈ R 𝑘 × 𝑑 .The first neural personalized visualization recommendation architecture that we introduce called Neural PVisRec leverages the user, visual-configuration, and attribute embeddings from the PVisRecmodel in Section 3.4 as input into a deep multilayer neural network with 𝐿 fully-connected layers, 𝜙 (V = ⟨ X ( 𝑘 ) 𝑖 𝑗 , 𝐶 𝑡 ⟩) = (cid:2) U ⊤ e 𝑖 Z ⊤ e 𝑡 V ⊤ e 𝑟 · · · V ⊤ e 𝑟 𝑠 (cid:3) ⊤ (38) q = 𝜎 ( W 𝜙 (V) + b ) (39) q = 𝜎 ( W q + b ) (40) ... q 𝐿 = 𝜎 𝐿 ( W 𝐿 q 𝐿 − + b 𝐿 ) (41) (cid:98) 𝑦 = 𝜎 ( h ⊤ q 𝐿 ) (42)where W 𝐿 , b 𝐿 , and 𝜎 𝐿 are the weight matrix, bias vector, and activation function for layer 𝐿 . Further, (cid:98) 𝑦 = 𝜎 ( h ⊤ q 𝐿 ) (Eq. 42) is the output layer where 𝜎 is the output activation function and h ⊤ denotesthe edge weights of the output function. For the hidden layers, we used ReLU as the activationfunction. Note that if the visualization does not use all 𝑠 attributes, then we can pad the remainingunused attributes with zeros. This enables the multi-layer neural network architecture to be flexiblefor visualizations with any number of attributes. Eq. 38-42 can be written more succinctly as (cid:98) 𝑦 = 𝜎 (cid:0) h ⊤ 𝜎 𝐿 ( W 𝐿 ( ...𝜎 ( W (cid:2) U ⊤ e 𝑖 Z ⊤ e 𝑗 V ⊤ e 𝑟 · · · V ⊤ e 𝑟 𝑠 (cid:3) ⊤ + b ) ... ) + b 𝐿 ) (cid:1) (43)where (cid:98) 𝑦 is the predicted visualization score for user 𝑖 . ersonalized Visualization Recommendation 1:23 We also investigated a second neural approach for the personalizedvisualization recommendation problem. This approach combines scores from PVisRec and Eq. 43.More formally, given user 𝑖 along with an arbitrary visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , 𝐶 𝑡 ) generated fromsome dataset X 𝑖 𝑗 of interest to user 𝑖 , we derive a personalized user-specific score for visualization V (for user 𝑖 ) as (cid:98) 𝑦 PVisRec = U 𝑖, : Z ⊤ 𝑡, : (cid:206) x 𝑗 ∈ X ( 𝑘 ) U 𝑖, : V ⊤ 𝑗, : where X ( 𝑘 ) 𝑖 𝑗 is a subset of attributes used inthe visualization V from the users dataset X 𝑖 𝑗 (hence, | X ( 𝑘 ) | ≤ | X 𝑖 𝑗 | ) and 𝐶 𝑡 ∈ 𝒞 is the visual-configuration for visualization V . Then, we have (cid:98) 𝑦 = ( − 𝛼 ) (cid:32) U 𝑖 Z ⊤ 𝑡 (cid:214) x 𝑗 ∈ X ( 𝑘 ) U 𝑖 V ⊤ 𝑗 (cid:33) + 𝛼 (cid:98) 𝑦 dnn (44)where (cid:98) 𝑦 dnn = 𝜎 (cid:0) h ⊤ 𝜎 𝐿 ( W 𝐿 ( ...𝜎 ( W 𝜙 (V)+ b ) ... )+ b 𝐿 ) (cid:1) with 𝜙 (V) = (cid:2) U ⊤ e 𝑖 Z ⊤ e 𝑡 V ⊤ e 𝑟 · · · V ⊤ e 𝑟 𝑠 (cid:3) ⊤ and 𝛼 ∈ ( , ) is a hyperparameter that controls the influence of the models on the final predictedscore of the visualization for user 𝑖 .All layers of the various neural architectures for our personalized visualization recommendationproblem use ReLU nonlinear activation. Unless otherwise mentioned, we used three hidden layersand optimized model parameters using mini-batch Adam with a learning rate of 0.001. We designedthe neural network structure such that the bottom layers are the widest and each successive layerhas 1/2 the number of neurons. For fairness, the last hidden layer is set to the embedding size.Hence, if the embedding size is 8, then the architecture of the layers is 32 → → The user-centric visualization training corpus 𝒟 = { 𝒳 𝑖 , V 𝑖 } 𝑛𝑖 = for personalized visualization recommendation consists of user-level training data for 𝑛 users where for each user 𝑖 ∈ [ 𝑛 ] we have a set of datasets 𝒳 𝑖 = { X 𝑖 , . . . , X 𝑖 𝑗 , . . . } of interest to that user along with user 𝑖 ’s “relevant” (generated, liked, clicked-on) visualizations V 𝑖 = { 𝒱 𝑖 , . . . , 𝒱 𝑖 𝑗 , . . . } for each of thosedatasets. For each user 𝑖 ∈ [ 𝑛 ] and dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 of interest to user 𝑖 , there is a set 𝒱 𝑖 𝑗 = { . . . , V = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) , . . . } of relevant (positive) visualizations for that user, and we also leveragea sampled set of non-relevant (negative) visualizations 𝒱 − 𝑖 𝑗 for that user 𝑖 and dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 .Therefore, the set of training visualizations for user 𝑖 ∈ [ 𝑛 ] and dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 is 𝒱 𝑖 𝑗 ∪ 𝒱 − 𝑖 𝑗 and 𝑌 𝑖 𝑗𝑘 ∈ { , } denotes the ground-truth label of visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) ∈ 𝒱 𝑖 𝑗 ∪ 𝒱 − 𝑖 𝑗 . Hence, 𝑌 𝑖 𝑗𝑘 = 𝑖 whereas 𝑌 𝑖 𝑗𝑘 = i.e. , V = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) ∈ 𝒱 − 𝑖 𝑗 . The goal is to have the modelscore (cid:98) 𝑌 𝑖 𝑗𝑘 ∈ [ , ] each training visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) ∈ 𝒱 𝑖 𝑗 ∪ 𝒱 − 𝑖 𝑗 for a user 𝑖 as closeas possible to the ground-truth label 𝑌 𝑖 𝑗𝑘 . The neural personalized visualization recommendationmodel is learned by optimizing the likelihood of model scores for all visualizations of each user.Given a user 𝑖 ∈ [ 𝑛 ] and the model parameters Θ , the likelihood isP ( (cid:98) V − 𝑖 , V 𝑖 | Θ ) = | 𝒳 𝑖 | (cid:214) 𝑗 = (cid:214) ( X ( 𝑘 ) 𝑖𝑗 , C 𝑖𝑗𝑘 ) ∈ 𝒱 𝑖𝑗 (cid:98) 𝑌 𝑖 𝑗𝑘 (cid:214) ( X ( 𝑘 ) 𝑖𝑗 , C 𝑖𝑗𝑘 ) ∈ (cid:98) 𝒱 − 𝑖𝑗 (cid:16) − (cid:98) 𝑌 𝑖 𝑗𝑘 (cid:17) , for 𝑖 = , . . . , 𝑛 (45)where (cid:98) 𝑌 𝑖 𝑗𝑘 is the predicted score of a visualization V = ( X ( 𝑘 ) 𝑖 𝑗 , C 𝑖 𝑗𝑘 ) for user 𝑖 and dataset 𝑗 ( X 𝑖 𝑗 ∈ 𝒳 𝑖 ). Naturally, the goal is to obtain (cid:98) 𝑌 V such that it is as close as possible to the actualground-truth 𝑌 V . Taking the negative log of the likelihood in Eq. 45 and summing over all 𝑛 users :24 X. Qian et al. and their sets of relevant visualizations 𝒱 𝑖 𝑗 from | 𝒳 𝑖 | different datasets give us the total loss L . L = 𝑛 ∑︁ 𝑖 = | 𝒳 𝑖 | ∑︁ 𝑗 = (cid:32) − ∑︁ ( X ( 𝑘 ) 𝑖𝑗 , C 𝑖𝑗𝑘 ) ∈ 𝒱 𝑖𝑗 log (cid:98) 𝑌 𝑖 𝑗𝑘 − ∑︁ ( X ( 𝑘 ) 𝑖𝑗 , C 𝑖𝑗𝑘 ) ∈ (cid:98) 𝒱 − 𝑖𝑗 log ( − (cid:98) 𝑌 𝑖 𝑗𝑘 ) (cid:33) = − 𝑛 ∑︁ 𝑖 = | 𝒳 𝑖 | ∑︁ 𝑗 = ∑︁ ( X ( 𝑘 ) 𝑖𝑗 , C 𝑖𝑗𝑘 ) ∈ 𝒱 𝑖𝑗 ∪ (cid:98) 𝒱 − 𝑖𝑗 𝑌 𝑖 𝑗𝑘 log (cid:98) 𝑌 𝑖 𝑗𝑘 + ( − 𝑌 𝑖 𝑗𝑘 ) log ( − (cid:98) 𝑌 𝑖 𝑗𝑘 ) (46)where the objective function above is minimized via stochastic gradient descent (SGD) to updatethe model parameters Θ in M . Since this is the first work that addresses the personalized visualization recommendation problem,there were not any existing public datasets that could be used directly for our problem. Recentworks have ignored the user information [Hu et al. 2019; Qian et al. 2020] that details the “author”of the visualization, which is required in this work for user-level personalization. As an aside,VizML [Hu et al. 2019] discarded all user information and only kept the attributes used in an actualvisualization (and therefore did not consider datasets as well). In this work, since we focus onthe personalized visualization recommendation problem, we derive a user-centered dataset wherefor each user we know their datasets, visualizations, attributes, and visualization-configurationsused. We started from the raw Plot.ly community feed data. For the personalized visualizationrecommendation problem, we first extract the set of all 𝑛 users in the visualization corpus. For eachuser 𝑖 ∈ [ 𝑛 ] , we then extract the set of datasets 𝒳 𝑖 of interest to that user. These are the datasetsthat user 𝑖 has generated at least one visualization. Depending on the visualization corpus data, thiscould also be other types of user feedback such as a visualization that a user liked or clicked. Next,we extract the set of user-preferred visualizations 𝒱 𝑖 𝑗 for each of the datasets X 𝑖 𝑗 ∈ 𝒳 𝑖 of interestto user 𝑖 . Hence, 𝒱 𝑖 𝑗 is the set of visualizations generated (or liked, clicked, ...) by user 𝑖 for dataset 𝑗 ( X 𝑖 𝑗 ). Every visualization V ∈ 𝒱 𝑖 𝑗 preferred by user 𝑖 also obviously contains the attributes fromdataset X 𝑖 𝑗 ∈ 𝒳 𝑖 used in the visualization ( i.e. , the attributes that map to the x, y, binning, color,and so on).In Table 4, we report statistics about the personalized visualization corpus used in our work,including the number of users, attributes, datasets, visualizations, and visualization-configurationsextracted from all the user-generated visualizations, and so on. The corpus 𝒟 = { 𝒳 𝑖 , V 𝑖 } 𝑛𝑖 = for learning individual personalized visualization recommendation models consists of a totalof 𝑛 = ,

469 users with | (cid:208) 𝑛𝑖 = 𝒳 𝑖 | = ,

419 datasets used by those users. Further, there are 𝑚 = , ,

033 attributes among the 94 ,

419 datasets of interest by the 17 . 𝑘 users. Our user-centricvisualization training corpus 𝒟 has a total of | (cid:208) 𝑛𝑖 = V 𝑖 | = 𝑛 = . 𝑘 users with an average of 1.85 relevant visualizations per user. Each user in thecorpus has an average of 5.41 datasets and each dataset has an average of 24.39 attributes. From the32.3k user-relevant visualizations from the 17 . 𝑘 users, we extracted a total of | 𝒞 | =

686 uniquevisual-configurations. To further advance research on personalized visualization recommendersystems, we have made the user-level plot.ly data that we used for studying the personalizedvisualization recommendation problem (introduced in Section 2.3) publicly accessible at:http://networkrepository.com/personalized-vis-rec http://vizml-repository.s3.amazonaws.com/plotly_full.tar.gz ersonalized Visualization Recommendation 1:25 Table 4. Personalized visualization recommendation data corpus. These user-centric dataset is used forlearning personalized visualization recommendation models for individual users. A ) <0.0001Density ( C ) <0.0001Density ( D ) <0.0001Density ( M ) 0.4130 We have also made the graph representations used in our personalized visualization recommenda-tion framework publicly accessible at http://networkrepository.com/personalized-vis-rec-graphs

To investigate the effectiveness of the personalized visualization recommendation approach, wedesign experiments to answer the following research questions: • RQ1:

Given a user and a new dataset of interest to that user, can we accurately recommend thetop most relevant visualizations for that specific user (Section 6.1)? • RQ2:

How does our user-level personalized visualization recommendations compare to thenon-personalized global recommendations (Section 6.2)? • RQ3:

Can we significantly reduce the space requirements of our approach by trading off a smallamount of accuracy for a large improvement in space (Section 6.3)? • RQ4:

Does the neural personalized visualization recommendation models further improve theperformance when incorporating a multilayer deep neural network component (Section 6.4)?

Table 5. Personalized Visualization Recommendation Results. Note 𝑑 = . See text for discussion. HR@K NDCG@K

Model @1 @2 @3 @4 @5 @1 @2 @3 @4 @5VizRec N/A N/A N/A N/A N/A N/A N/A N/A N/A N/AVisPop 0.186 0.235 0.255 0.271 0.289 0.181 0.214 0.224 0.231 0.238VisConfigKNN 0.026 0.030 0.038 0.055 0.089 0.016 0.021 0.026 0.034 0.048VisKNN 0.147 0.230 0.297 0.372 0.449 0.143 0.195 0.227 0.257 0.286eALS 0.304 0.395 0.426 0.441 0.449 0.302 0.360 0.376 0.382 0.385MLP 0.218 0.452 0.601 0.671 0.715 0.211 0.357 0.435 0.465 0.483PVisRec :26 X. Qian et al.

Now we evaluate the system for recommending personalized visualiza-tions to a user. Given an arbitrary user, we know the visualization(s) they preferred for each ofthe datasets of interest to them. Therefore, we can quantitatively evaluate the proposed approachfor personalized visualization recommendation. For each user, we randomly select one of theirdatasets where the user has manually created at least two visualizations (treated as positive ex-amples), and randomly select one of those positive visualizations to use for testing, and the otherpositive instances are used for training and validation. This is similar to leave-one-out evaluationwhich is widely used in traditional user-item recommender systems [He et al. 2016]. However, inour case, we have thousands of datasets, and for each dataset there are a large and completely disjoint set of possible visualizations to recommend to that user. Since it is too computationallyexpensive to rank all visualizations for every user (and every dataset of interest) during evaluation,we randomly sample 19 visualizations that were not created by the users. This gives us a total of 20visualizations per user (1 relevant + 19 non-relevant visualizations) to use for evaluation of thepersonalized visualization recommendations from our proposed models. Using this held-out set ofuser visualizations, we evaluate the ability of the proposed approach to recommend these held-outrelevant visualizations to the user (which are visualizations the user actually created), among theexponential amount of alternative visualizations (that arise for a single dataset of interest) from aset of attributes and sets of design choices ( e.g. , chart-types, ...). In particular, given a user 𝑖 and adataset of interest to that user, we use the proposed approach to recommend the top- 𝑘 visualizationspersonalized for that specific user and dataset. To quantitatively evaluate the personalized rankingof visualizations given by the proposed personalized visualization recommendation models, weuse rank-based evaluation metrics including Hit Ratio at 𝐾 (HR@K) and Normalized DiscountedCumulative Gain (NDCG@K) [He et al. 2016]. Intuitively, HR@K quantifies whether the held-outrelevant (user generated) visualization appears in the top- 𝐾 ranked visualizations or not. Similarly,NDCG@K takes into account the position of the relevant (user generated) visualization in thetop- 𝐾 ranked list of visualizations, by assigning larger scores to visualizations ranked more highlyin the list. For both HR@K and NDCG@K, we report 𝐾 = , . . . , 𝑖 along with the set of relevant and non-relevant visualizations 𝒱 𝑖 𝑗 ∪ 𝒱 − 𝑖 𝑗 for that user and their dataset X 𝑖 𝑗 ∈ 𝒳 𝑖 of interest, we derive a score for each of the visualizations V ∈ ( 𝒱 𝑖 𝑗 ∪ 𝒱 − 𝑖 𝑗 ) where | 𝒱 𝑖 𝑗 | + | 𝒱 − 𝑖 𝑗 | =

20. An effective personalized visualization recommenderwill assign a larger score to the relevant visualizations and smaller scores to the non-relevantvisualizations, hence, the relevant visualizations will show up first, followed by the non-relevantvisualizations (which should appear further down the list). Unless otherwise mentioned, we use 𝑑 =

10 as the embedding size and use the full meta-feature matrix M . For the neural variants of ourapproach, we use 𝛼 = . Since the personalized visualization recommendation problem introduced inSection 2 is new, there are not any existing vis. rec. methods that can be directly applied to solve it.For instance, VizRec [Mutlu et al. 2016] is the closest existing approach, though is unable to beused since it explicitly assumes a single dataset where users provide feedback about visualizationspertaining to that dataset of interest. However, in our problem formulation and corpus 𝒟 = {( 𝒳 𝑖 , V 𝑖 )} 𝑛𝑖 = , every user 𝑖 ∈ [ 𝑛 ] can have their own set of datasets 𝒳 𝑖 that are not shared by anyother user. In such cases, it is impossible to use VizRec. Nevertheless, we adapted a wide variety ofmethods to use as baselines for evaluation. We now briefly summarize these methods below: The set of candidate visualizations for a specific dataset are not only disjoint ( i.e. , completely different from any other set ofvisualizations generated from another dataset), but the amount of possible visualizations for a given dataset are exponentialin the number of attributes, possible design choices, and so on, making this problem unique and fundamentally challenging. ersonalized Visualization Recommendation 1:27

Table 6. Ablation study results for different variants of our personalized visualization recommendationapproach.

HR@K NDCG@K

Model @1 @2 @3 @4 @5 @1 @2 @3 @4 @5PVisRec ( A , C , M only) 0.307 0.416 0.470 0.488 0.501 0.306 0.374 0.401 0.410 0.415PVisRec ( A , C , D only) 0.414 0.474 0.537 0.610 0.697 0.384 0.435 0.450 0.457 0.460PVisRec • VisPop : Given a visualization V with attributes X ( 𝑘 ) and visual-configuration C ∈ 𝒞 , thescore of visualization V is 𝜙 ( 𝑉 ) = 𝑓 (C) (cid:206) x ∈ X ( 𝑘 ) 𝑓 ( x ) where 𝑓 ( x ) is the frequency of attribute x (sum of the columns of A ) and 𝑓 (C) is the frequency of visual-configuration C . Hence,the score given by VisPop is a product of the frequencies of the underlying visualizationcomponents, i.e. , visual-configuration and attributes used in the visualization being scored. • VisKNN : This is the standard item-based collaborative filtering method adapted for thevisualization recommendation problem. Given a visualization V with attributes X ( 𝑘 ) andvisual-configuration C ∈ 𝒞 , then we score V by taking the mean score of the visual configu-rations most similar to C , along with the mean score of the top attributes most similar toeach of the attributes used in the visualization. • VisConfigKNN : This approach is similar to VisKNN, but uses only the visual-configurationmatrix to score the visualizations. • eALS : This is an adapted version of the state-of-the-art MF method used for item recommen-dation in [He et al. 2016]. We adapted it for our visualization recommendation problem byminimizing squared loss while treating all unobserved user iterations between attributesand visual-configurations as negative examples, which are weighted non-uniformly by thefrequency of attributes and visual-configurations. • MLP : We used three hidden layers and optimized model parameters using mini-batch Adamwith a learning rate of 0.001. For the activation functions of the MLP layers, we used ReLU.For fairness, the last hidden layer is set to the embedding size. • VizRec [Mutlu et al. 2016]: For each dataset, this approach constructs a user-by-visualizationmatrix and uses it to obtain the average overall rating among the similar users of a visualiza-tion, where a user is similar if it has rated a visualization preferred by the active user. VizRecassumes a single dataset and is only applicable when there are a large number of users thathave rated visualizations from the same dataset.

We provide the results in Table 5. Overall, the proposed approach, PVisRec, sig-nificantly outperforms the baseline methods by a large margin as shown in Table 5. Strikingly,PVisRec consistently achieves the best HR@K and NDCG@K across all 𝐾 = , , . . . ,

5. From Table 5,we see that PVisRec achieves a mean relative improvement of 107 .

2% and 106 .

6% over the bestperforming baseline method (eALS) for HIT@1 and NDCG@1, respectively. Comparing HIT@5 andNDCG@5, PVisRec achieves a mean improvement of 29 .

8% and 64 .

9% over the next best performingmethod (MLP). As an aside, VizRec is the only approach proposed for ranking visualizations. Allother methods used in our comparison are new and to the best of our knowledge have never beenextended for ranking and recommending visualizations. Recall that VizRec in itself solves a differentproblem, but we point out the above since it is clearly the closest. As discussed in Section 7, all ofthe assumptions required by VizRec are unrealistic in practice. This is also true when using VizRecfor our problem and corpus 𝒟 = {( 𝒳 𝑖 , V 𝑖 )} 𝑛𝑖 = where every user 𝑖 ∈ [ 𝑛 ] can have their own setof datasets 𝒳 𝑖 that are not shared by any other user. In such cases, we use “N/A” to denote this :28 X. Qian et al. K H R @ K VisPopVisConfigKNN VisKNNeALS MLPPVisRec K N D C G @ K VisPopVisConfigKNN VisKNNeALS MLPPVisRec

Fig. 4. Evaluation of top-K personalized visualization recommendations.

HIT@k E m bedd i ng D i m en s i on ( d ) Fig. 5. Ablation study results for personalized visualization recommendation with varying embedding dimen-sions 𝑑 ∈ { , . . . , } and HIT@k for 𝑘 = , . . . , . See text for discussion. fact. This is due to the VizRec assumption that there is a single dataset of interest by all 𝑛 users,and every user has given many different preferences on the relevant visualizations generated forthat specific dataset. All of these assumptions are violated in our problem. Figure 4 shows themean performance of the top- 𝐾 visualization recommendations for 𝐾 = , , . . . ,

10. These resultsdemonstrate the effectiveness of our user personalized visualization recommendation approachas we are able to successfully recommend users the held-out visualizations that they previouslycreated.

Previously, we observed that PVisRec significantly outperformsother methods for the personalized visualization recommendation problem. To understand theimportance of the different model components of PVisRec, we investigate a few different variantsof our personalized visualization recommendation model. The first variant called PVisRec ( A , C , M only) does not use the attribute by visual-configuration graph represented by the sparse adjacency ersonalized Visualization Recommendation 1:29 nDCG@k E m bedd i ng D i m en s i on ( d ) Fig. 6. Ablation study results for personalized visualization recommendation with varying embedding dimen-sions 𝑑 ∈ { , . . . , } and nDCG@k for 𝑘 = , . . . , . See text for discussion. matrix D whereas the second variant called PVisRec ( A , C , D only) does not use the dense meta-feature matrix M for learning. This is in contrast to PVisRec that uses A , C , D and M . In Table 6,we see that both variants perform worse than PVisRec, indicating the importance of using allthe graph representations for learning the personalized visualization recommendation model.Further, PVisRec ( A , C , D only) outperforms the other variant across both ranking metrics andacross all 𝐾 . This suggests that D may be more important for learning than M . Nevertheless, thebest personalized visualization recommendation performance is obtained when both D and M areused along with A and C . Finally, these two simpler variants still perform better than the baselinesfor HR@1 and NDCG@1 as shown in Table 5.To understand the effect of the embedding size 𝑑 on the performance of our personalizedvisualization recommendation approach, we vary the dimensionality of the embeddings 𝑑 from 1 to1024. In these experiments, we use PVisRec with the full-rank meta-feature matrix M . In Figure 5,we show results for the personalized visualization recommendation problem using our PVisRecapproach with varying embedding dimensions (size) 𝑑 ∈ { , . . . , } and HR@K for 𝑘 = , . . . , 𝑘 = , . . . ,

10 while varying theembedding size 𝑑 ∈ { , . . . , } . This experiment uses the original meta-feature matrix M and notthe compressed meta-feature embedding (MFE) matrix. For both HR@K and NDCG@K, we observein Figure 5-6 that performance typically increases as a function of the embedding dimension 𝑑 . Wealso observe that for HIT@1 and nDCG@1, the best performance is achieved when 𝑑 = 𝑑 becomes too large, we observe a large drop in performance, whichis due to overfitting. For instance, in Figure 5, we see that when 𝑑 = 𝑑 = To answer RQ2, we compare the personalized visualization recommendation model (PVisRec)to a non-personalized ML model. More specifically, we compare the user-specific personalizedvisualization recommendation model (PVisRec) to a global non-personalized ML-based method :30 X. Qian et al.

Table 7. Results comparing Non-personalized vs. Personalized Visualization Recommendation.

HR@K NDCG@K

Model @1 @2 @3 @4 @5 @1 @2 @3 @4 @5Non-personalized 0.151 0.248 0.319 0.373 0.404 0.145 0.209 0.244 0.268 0.280Personalized that does not leverage a user-specific personalized model for each user. For fairness, we simplyleverage the specific user embedding for the personalized model, and for the non-personalizedmodel we simply derive an aggregate global embedding of a typical user, and leverage this globalnon-personalized model to rank the visualizations. More formally, the non-personalized ML-basedapproach uses a global user embedding derived as, u 𝑔 = 𝑛 𝑛 ∑︁ 𝑖 = U 𝑖 (47)where u 𝑔 is called the global user embedding and represents the centroid of the user embeddingsfrom PVisRec. Everything else remains the same as the personalized visualization recommendationapproach. More formally, given a user 𝑖 along with an arbitrary visualization V = ( X ( 𝑘 ) , 𝐶 𝑡 ) generated from some dataset X , we derive a score for the visualization V using the global userembedding u 𝑔 from Eq. 47 as follows: 𝜙 𝑔 ( 𝑉 ) = u 𝑔 Z ⊤ 𝑡, : (cid:214) x 𝑗 ∈ X ( 𝑘 ) u 𝑔 V ⊤ 𝑗, : (48)where X ( 𝑘 ) is a subset of attributes used in the visualization V from the dataset X (hence, | X ( 𝑘 ) | ≤| X | ) and C 𝑡 ∈ 𝒞 is the visual-configuration of V . Hence, instead of leveraging user 𝑖 ’s personalizedvisualization recommendation model to obtain a user personalized score for visualization V (thatis 𝜙 ( 𝑉 ) = U 𝑖, : Z ⊤ 𝑡, : (cid:206) x 𝑗 ∈ X ( 𝑘 ) U 𝑖, : V ⊤ 𝑗, : ), we replace U 𝑖, : with the global user embedding u 𝑔 representinga “typical” user. Results are provided in Table 7. For both models in Table 7, we use the sameexperimental setup from Section 6.1. This PVisRec model is used for learning U , then Eq. 47 isused for the non-personalized model. Notably, the non-personalized approach that uses the sameglobal user model for all users performs significantly worse (as shown in Table 7) compared to theuser-level personalized approach that leverages the appropriate learned model to personalize theranking of visualizations with respect to the user at hand. This demonstrates the significance oflearning individual models for each user that are personalized based on the users attribute/datapreferences along with their visual design choice preferences (RQ2) . In this section, we investigate using a low-rank meta-feature embedding matrix to significantlyimprove the space-efficiency of our proposed approach. In particular, we replace the original meta-feature matrix M with a low-rank approximation that captures the most important and meaningfulmeta-feature signals in the data. In addition to significantly reducing the space requirements ofPVisRec, we also investigate the performance when the low-rank meta-feature embeddings are used,and the space and accuracy trade-off as the number of meta-feature embedding dimensions variesfrom { , , , , } . We set 𝑑 =

10 and vary the dimensions of the dimensionality of the meta-featureembeddings (MFE) from { , , , , } across the different proposed approaches. We provide theresults in Table 8 for the space-efficient variants of our personalized visualization recommendation ersonalized Visualization Recommendation 1:31 Table 8. Space vs. Accuracy Trade-off Results using Meta-Feature Embeddings (MFE). Results for the space-efficient variants of our personalized visualization recommendation methods that use meta-feature embed-dings. In particular, we set 𝑑 = and vary the dimensions of the meta-feature embeddings from { , , , , } .See text for discussion. HR@K NDCG@K

Model MFE dim. @1 @2 @3 @4 @5 @1 @2 @3 @4 @5PVisRec ( A , C , M only) 1 0.284 0.413 0.480 0.512 0.529 0.282 0.364 0.398 0.412 0.4182 0.245 0.348 0.395 0.417 0.429 0.244 0.308 0.333 0.342 0.3464 0.265 0.388 0.444 0.468 0.481 0.263 0.341 0.369 0.380 0.3858 0.304 0.419 0.462 0.492 0.506 0.302 0.376 0.397 0.410 0.41616 0.294 0.404 0.452 0.471 0.483 0.292 0.362 0.386 0.395 0.399PVisRec 1 0.467 0.589 0.641 0.667 0.681 0.464 0.542 0.569 0.580 0.5852 0.542 0.685 0.744 0.771 0.792 0.539 0.630 0.660 0.672 0.6804 0.544 0.713 0.779 0.815 0.829 0.541 0.649 0.682 0.698 0.7048 0.608 0.806 0.874 0.906 0.925 0.604 0.731 0.765 0.779 0.78716 0.616 0.794 0.865 0.896 0.916 0.613 0.726 0.762 0.776 0.784 methods that use meta-feature embeddings. Overall, we find that in nearly all cases, we find similarHR@K and NDCG@K compared to the original variants, while obtaining a significantly morecompact model with orders of magnitude less space. For instance, when MFE dim. is 16, PvisRechas a HIT@1 of 0.616 compared to 0.630 using the original 1006-dimensional meta-feature matrix,which uses roughly 63x more space compared to the 16-dimensional MFE variant. As an aside,since PVisRec ( A , C , D ) does not use M , it does not have a meta-feature embedding (MFE) variant.This implies that we can indeed significantly reduce the space requirements of our approaches bytrading off only a tiny amount of accuracy (RQ3) . In this section, we study the performance of the proposed Neural Personalized VisualizationRecommendation models (

RQ4 ). For these experiments, we use 𝑑 =

10 and 𝛼 = . 𝐾 personalized visualization recommenda-tions). This is expected since the neural visualization models all leverage the graph-based PVisRecmodel in some fashion. Neural PVisRec uses the learned low-dimensional embeddings of the users,visual-configurations, attributes, and meta-features of the attributes as input into the first layerwhereas Neural PVisRec-CMF also uses the learned low-dimensional embeddings, but also usesthe predicted visualization scores from the PVisRec model for each user and combines these withthe predicted scores from the neural component. Notably, both neural personalized visualizationrecommendation models outperform the simpler and faster graph-based approach. In Table 9,Neural PVisRec-CMF outperforms the simpler Neural PVisRec network. This holds for HR@ 𝐾 andNDCG@ 𝐾 , and across all top- 𝐾 personalized visualization recommendations where 𝐾 ∈ { , ..., } . Neural PVisRec is flexible and can leverage any nonlinearactivation functions for the fully-connected layers of our multilayer neural network architecture forpersonalized visualization recommendation. In Table 10, we compare three non-linear activationfunctions 𝜎 for learning a personalized visualization recommendation model including hyperbolictangent (tanh) 𝜎 ( x ) = tanh ( x ) , sigmoid 𝜎 ( x ) = /( + exp [− x ]) , and ReLU 𝜎 ( x ) = max ( , x ) . The :32 X. Qian et al. Table 9. Results for the

Neural

Personalized Visualization Recommendation Models.

HR@K NDCG@K

Model @1 @2 @3 @4 @5 @1 @2 @3 @4 @5Neural PVisRec 0.656 0.825 0.889 0.923 0.946 0.652 0.761 0.793 0.808 0.817Neural PVisRec-CMF 0.762 0.879 0.922 0.944 0.961 0.729 0.822 0.845 0.855 0.861 results in Table 10 show that ReLU performs best by a large margin followed by sigmoid and thentanh. ReLU likely performs well due to its ability to avoid saturation, handle sparse data and be lesslikely to overfit.

Table 10. Ablation study results of Neural PVisRec with different nonlinear activation functions. We reportHR@1 for brevity. All results use 𝑑 = and 𝛼 = . . nonlinear activation 𝜎 Model tanh sigmoid ReLUNeural PVisRec 0.615 0.624 0.656Neural PVisRec-CMF 0.613 0.640 0.762

To understand the impact of the number of layers on the performance of theneural personalized visualization recommendation models, we vary the number of hidden layersfrom 𝐿 ∈ { , , , } . In Table 11, the performance increases as additional hidden layers are included,and begins to decrease at 𝐿 =

4. The best performance is achieved with three hidden layers. Thisresult indicates the benefit of deep learning for personalized visualization recommendation.

Recall that our network structure followed a tower pattern where the layer size ofeach successive layer is halved. In this experiment, we investigate larger layer sizes while fixing thefinal output embedding size to be 8 and using 4 hidden layers. In Table 12, we observe a significantimprovement in the visualization ranking when using larger layer sizes.

Neural PVisRec is also fast, taking on average 10.85 seconds to trainusing the large personalized visualization corpus from Section 5. The other neural visualizationrecommender is nearly as fast, as it contains only an additional step that is linear in the outputembedding size. For these experiments, we used a 2017 MacBook Pro with 16GB memory and3.1GHz Intel Core i7 processor.

Table 11. Comparing performance of Neural PVisRec with different number of hidden layers.

HR@K NDCG@K @1 @2 @3 @4 @5 @1 @2 @3 @4 @51 0.579 0.773 0.844 0.880 0.896 0.578 0.701 0.737 0.752 0.7582 0.618 0.801 0.865 0.892 0.907 0.618 0.733 0.765 0.777 0.7833 0.656 0.825 0.889 0.923 0.946 0.652 0.761 0.793 0.808 0.8174 0.646 0.754 0.813 0.842 0.869 0.499 0.639 0.680 0.694 0.705 ersonalized Visualization Recommendation 1:33

Table 12. Varying layer sizes in the deep personalized visualization recommendation model (Neural PVisRec).

HR@K

Layer Sizes @1 @2 @3 @4 @58-16-32-64 0.701 0.790 0.832 0.865 0.8838-32-128-512 0.734 0.797 0.846 0.874 0.8868-48-288-1728 0.752 0.822 0.869 0.895 0.913

Rule-based visualization recommendation systems such as Voyager [Vartak et al. 2017; Wongsupha-sawat et al. 2015, 2017], VizDeck [Perry et al. 2013], and DIVE [Hu et al. 2018] use a large set ofrules defined manually by domain experts to recommend appropriate visualizations that satisfy therules [Casner 1991; Derthick et al. 1997; Feiner 1985; Lee 2020; Mackinlay 1986; Mackinlay et al.2007a; Roth et al. 1994; Seo and Shneiderman 2005; Stolte et al. 2002]. Such rule-based systems donot leverage any training data for learning or user personalization. There have been a few “hybrid”approaches that combine some form of learning with manually defined rules for visualizationrecommendation [Moritz et al. 2018], e.g. , Draco learns weights for rules (constraints) [Moritzet al. 2018]. Recently, there has been work that focused on the end-to-end ML-based visualizationrecommendation problem [Dibia and Demiralp 2019; Qian et al. 2020]. However, this work learns aglobal visualization recommendation model that is agnostic of the user, and thus not able to beused for the personalized visualization recommendation problem studied in our work.All of the existing rule-based [Hu et al. 2018; Perry et al. 2013; Vartak et al. 2017; Wongsuphasawatet al. 2015, 2017], hybrid [Moritz et al. 2018], and pure ML-based visualization recommendation [Qianet al. 2020] approaches are unable to recommend personalized visualizations for specific users. Theseapproaches do not model users, but focus entirely on learning or manually defining visualizationrules that capture the notion of an effective visualization [Cui et al. 2019; Dang and Wilkinson 2014;Demiralp et al. 2017; Elzen and Wijk 2013; Key et al. 2012; Lee et al. 2019a; Lin et al. 2020; Mackinlayet al. 2007b; Siddiqui et al. 2016; Vartak et al. 2015; Wilkinson and Wills 2008; Wills and Wilkinson2010]. Therefore, no matter the user, the model always gives the same recommendations. Theclosest existing work is VizRec [Mutlu et al. 2016]. However, VizRec is only applicable when thereis a single dataset shared by all users (and therefore a single small set of visualizations that the usershave explicitly liked and tagged). This problem is unrealistic with many impractical assumptionsthat are not aligned with practice. Nevertheless, the problem solved by that prior work is a simplespecial case of the personalized visualization recommendation problem introduced in our paper.

Besides visualization recommendation, there are methods that solve simpler sub-tasks such as im-proving expressiveness, improving perceptual effectiveness, matching task types, etc. These simplersub-tasks can generally be divided two categories [Lee 2020; Wongsuphasawat et al. 2016]: whetherthe solution focuses on recommending data ( what data to visualize ), such as Discovery-driven DataCubes [Sarawagi et al. 1998], Scagnostics [Wilkinson et al. 2005], AutoVis [Wills and Wilkinson2010], and MuVE [Ehsan et al. 2016]) or recommending encoding ( how to design and visuallyencode the data ), such as APT [Mackinlay 1986], ShowMe [Mackinlay et al. 2007a], and Draco–learn [Moritz et al. 2018]). While some of those are ML-based, none are able to recommend entirevisualizations (nor are they personalized), which is the focus of this work. For example, VizML [Huet al. 2019] predicts the type of a chart (e.g., bar, scatter, etc.) instead of complete visualization. :34 X. Qian et al.

Draco [Moritz et al. 2018] infers weights for a set of manually defined rules. VisPilot [Lee et al.2019b] recommended different drill-down data subsets from datasets. As an aside, not only doesthese works not solve the visualization recommendation problem, they are also not personalizedfor individual users. Instead of solving simple sub-tasks such as predicting the chart type of avisualization, we focus on the end-to-end personalized visualization recommendation problem (Sec. 2) :given a dataset of interest to user 𝑖 , the goal is to automatically recommend the top-k most effectivevisualizations personalized for that individual user. This paper fills the gap by proposing the first personalized visualization recommendation approach that is completely automatic, data-driven,and most importantly recommends personalized visualizations based on a users previous feedback,behavior, and interactions with the system.

In traditional item-based recommender systems [Adomavicius and Tuzhilin 2005; Noel et al. 2012;Ricci et al. 2011; Zhang et al. 2017; Zhao et al. 2020], there is a single shared set of items ( i.e. ,movies [Bennett et al. 2007; Covington et al. 2016; Harper and Konstan 2015], products [Lindenet al. 2003], hashtags [Sigurbjörnsson and Van Zwol 2008; Wang et al. 2020], documents [Kanakiaet al. 2019; Xu et al. 2020], news [Ge et al. 2020], books [Liu et al. 2014], and location [Bennett et al.2011; Ye et al. 2011; Zhou et al. 2019]). However, in the personalized visualization recommendationproblem studied in this work, since visualizations are dataset dependent, there is not a shared set ofvisualizations to recommend users. Therefore, given 𝑁 datasets, there are 𝑁 completely disjoint setsof visualizations that can be recommended. Every dataset consists of its own completely separateset of relevant visualizations that are exclusive to the dataset. Therefore, in contrast to the goalof traditional item recommender systems, the goal of personalized visualization recommendationis to learn a personalized vis. rec. model for each individual user, which is capable of scoring andultimately recommending personalized visualizations to that user from any unseen dataset inthe future. Some recent works have adapted various deep learning approaches for collaborativefiltering [Chen et al. 2020; Guan et al. 2019; He et al. 2017; Li et al. 2020; Sedhain et al. 2015]. However,none of these works have focused on the problem of personalized visualization recommendation studied in this work. The personalized visualization recommender problem has a few similaritieswith cross-domain recommendation [Gao et al. 2013; Hu et al. 2013; Man et al. 2017; Shapiraet al. 2013; Tang et al. 2012]. In cross-domain item recommendation, there is only a few datasetsas opposed to tens of thousands of different datasets in our problem [Zhao et al. 2020]. Moreimportantly, in cross-domain item recommendation, the different datasets are assumed to share atleast one mode between each other, whereas in personalized visualization recommendation, eachnew dataset gives rise to a completely different set of visualizations to recommend. In this work, we introduced the problem of user-specific personalized visualization recommendationand proposed an approach for solving it. The approach learns individual personalized visualizationrecommendation models for each user. In particular, the personalized vis. rec. models for each userare learned by taking into account the user feedback including both implicit and explicit feedbackregarding the visual and data preferences of the users, as well as users whom have also exploredsimilar datasets and visualizations. We overcome the issues with data sparsity and limited userfeedback by leveraging the data and visualization preferences of users whom are similar, despitethat the visualizations from such users are from completely different datasets. The models are ableto learn better visualization recommendation models for each individual user by leveraging the dataand visualization preferences of users whom are similar. In addition, we proposed a deep neuralnetwork architecture for neural personalized visualization recommendation that can learn complex ersonalized Visualization Recommendation 1:35 non-linear relationships between the users, their attributes of interest, and visualization preferences.This paper is a first step in the direction of learning personalized visualization recommendationmodels for individual users based on their data and visualization feedback, and the data and visualpreferences of users with similar data and visual preferences. Future work should investigate anddevelop better machine learning models and learning techniques to further improve the personalizedvisualization recommendation models and the visualization recommendations for individual users.

REFERENCES

Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey ofthe state-of-the-art and possible extensions.

TKDE

17, 6 (2005), 734–749.Thirunavukarasu Balasubramaniam, Richi Nayak, Chau Yuen, and Yu-Chu Tian. 2020. Column-wise element selection forcomputationally efficient nonnegative coupled matrix tensor factorization.

IEEE Transactions on Knowledge and DataEngineering (2020).James Bennett, Stan Lanning, et al. 2007. The netflix prize. In

KDD Cup . 35.Paul N Bennett, Filip Radlinski, Ryen W White, and Emine Yilmaz. 2011. Inferring and using location metadata to personalizeweb search. In

Proceedings of the 34th international ACM SIGIR conference on Research and development in InformationRetrieval . 135–144.Guillaume Bouchard, Dawei Yin, and Shengbo Guo. 2013. Convex collective matrix factorization. In

AISTATS . PMLR,144–152.Stephen M Casner. 1991. Task-Analytic Approach to the Automated Design of Graphic Presentations.

ACM Transactions onGraphics (ToG)

10, 2 (1991), 111–151.Chong Chen, Min Zhang, Yongfeng Zhang, Yiqun Liu, and Shaoping Ma. 2020. Efficient neural matrix factorization withoutsampling for recommendation.

ACM Transactions on Information Systems (TOIS)

38, 2 (2020), 1–28.Dongjin Choi, Jun-Gi Jang, and U Kang. 2019. S3 CMTF: Fast, accurate, and scalable method for incomplete coupledmatrix-tensor factorization.

PloS one

14, 6 (2019), e0217316.Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In

RecSys .191–198.Zhe Cui, Sriram Karthik Badam, M Adil Yalçin, and Niklas Elmqvist. 2019. Datasite: Proactive Visual Data Exploration WithComputation of Insight-Based Recommendations.

Information Visualization

18, 2 (2019), 251–267.Tuan Nhon Dang and Leland Wilkinson. 2014. ScagExplorer: Exploring Scatterplots by Their Scagnostics. In . IEEE, 73–80.Çağatay Demiralp, Peter J Haas, Srinivasan Parthasarathy, and Tejaswini Pedapati. 2017. Foresight: Recommending VisualInsights. In

Proceedings of the VLDB Endowment International Conference on Very Large Data Bases , Vol. 10.Mark Derthick, John Kolojejchick, and Steven F Roth. 1997. An interactive visualization environment for data exploration.In

KDD . 2–9.Victor Dibia and Çağatay Demiralp. 2019. Data2vis: Automatic generation of data visualizations using sequence-to-sequencerecurrent neural networks.

IEEE computer graphics and applications

39, 5 (2019), 33–46.Humaira Ehsan, Mohamed Sharaf, and Panos Chrysanthis. 2016. Muve: Efficient multi-objective view recommendation forvisual data exploration. In

ICDE .Stef van den Elzen and Jarke J. van Wijk. 2013. Small Multiples, Large Singles: A New Approach for Visual Data Exploration.In

Computer Graphics Forum , Vol. 32. 191–200.Steven Feiner. 1985. APEX: An Experiment in the Automated Creation of Pictorial Explanations.

IEEE Computer Graphicsand Applications

5, 11 (1985), 29–37.Sheng Gao, Hao Luo, Da Chen, Shantao Li, Patrick Gallinari, and Jun Guo. 2013. Cross-domain recommendation viacluster-level latent factor model. In

Joint European conference on machine learning and knowledge discovery in databases .Springer, 161–176.Suyu Ge, Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. Graph Enhanced Representation Learning forNews Recommendation. In

WWW .Xinyu Guan, Zhiyong Cheng, Xiangnan He, Yongfeng Zhang, Zhibo Zhu, Qinke Peng, and Tat-Seng Chua. 2019. Attentiveaspect modeling for review-aware recommendation.

ACM Transactions on Information Systems (TOIS)

37, 3 (2019), 1–27.F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context.

TIIS

5, 4 (2015), 1–19.Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In

WWW . 173–182.Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast matrix factorization for online recommendationwith implicit feedback. In

Proceedings of the 39th International ACM SIGIR conference on Research and Development inInformation Retrieval . 549–558. :36 X. Qian et al.

Kevin Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, and César Hidalgo. 2019. VizML: A Machine Learning Approach toVisualization Recommendation. In

Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI’19) . Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300358Kevin Hu, Diana Orghian, and César Hidalgo. 2018. Dive: A mixed-initiative system supporting integrated data explorationworkflows. In

Workshop on Human-In-the-Loop Data Anal.

Proceedings of the 22nd International Conference on World Wide Web . 595–606.Anshul Kanakia, Zhihong Shen, Darrin Eide, and Kuansan Wang. 2019. A scalable hybrid research paper recommendersystem for microsoft academic. In

WWW .Alicia Key, Bill Howe, Daniel Perry, and Cecilia Aragon. 2012. VizDeck: Self-Organizing Dashboards for Visual Analytics. In

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data . 681–684.Jingu Kim, Yunlong He, and Haesun Park. 2014. Algorithms for nonnegative matrix and tensor factorizations: A unifiedview based on block coordinate descent framework.

Journal of Global Optimization

58, 2 (2014), 285–319.Doris Jung-Lin Lee. 2020. Insight Machines: The Past, Present, and Future of Visualization Recommendation. (August 2020).Doris Jung-Lin Lee, Himel Dev, Huizi Hu, Hazem Elmeleegy, and Aditya Parameswaran. 2019a. Avoiding Drill-DownFallacies With VisPilot: Assisted Exploration of Data Subsets. In

Proceedings of the 24th International Conference onIntelligent User Interfaces . 186–196.Doris Jung-Lin Lee, Himel Dev, Huizi Hu, Hazem Elmeleegy, and Aditya Parameswaran. 2019b. Avoiding drill-down fallacieswith VisPilot: assisted exploration of data subsets. In

IUI . 186–196.Xiangsheng Li, Maarten de Rijke, Yiqun Liu, Jiaxin Mao, Weizhi Ma, Min Zhang, and Shaoping Ma. 2020. Learning BetterRepresentations for Neural Information Retrieval with Graph Information. In

Proceedings of the 29th ACM InternationalConference on Information & Knowledge Management . 795–804.Halden Lin, Dominik Moritz, and Jeffrey Heer. 2020. Dziban: Balancing Agency & Automation in Visualization Design viaAnchored Recommendations. In

Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems . 1–12.Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com recommendations: Item-to-item collaborative filtering.

Internet Computing

7, 1 (2003), 76–80.Yidan Liu, Min Xie, and Laks VS Lakshmanan. 2014. Recommending user generated item lists. In

RecSys . 185–192.Jock Mackinlay. 1986. Automating the design of graphical presentations of relational information.

ACM Trans. Graph.

5, 2(1986), 110–141.Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007a. Show Me: Automatic presentation for visual analysis.

TVCG

13, 6(2007), 1137–1144.Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007b. Show Me: Automatic Presentation for Visual Analysis.

IEEEtransactions on visualization and computer graphics

13, 6 (2007), 1137–1144.Tong Man, Huawei Shen, Xiaolong Jin, and Xueqi Cheng. 2017. Cross-Domain Recommendation: An Embedding andMapping Approach.. In

IJCAI . 2464–2470.Dominik Moritz, Chenglong Wang, Greg L Nelson, Halden Lin, Adam M Smith, Bill Howe, and Jeffrey Heer. 2018. Formal-izing visualization design knowledge as constraints: Actionable and extensible models in draco.

IEEE transactions onvisualization and computer graphics

25, 1 (2018), 438–448.Belgin Mutlu, Eduardo Veas, and Christoph Trattner. 2016. Vizrec: Recommending personalized visualizations.

ACMTransactions on Interactive Intelligent Systems (TIIS)

6, 4 (2016), 1–39.Joseph Noel, Scott Sanner, Khoi-Nguyen Tran, Peter Christen, Lexing Xie, Edwin V Bonilla, Ehsan Abbasnejad, and NicolásDella Penna. 2012. New objective functions for social collaborative filtering. In

Proceedings of the 21st InternationalConference on World Wide Web . 859–868.Jinoh Oh, Wook-Shin Han, Hwanjo Yu, and Xiaoqian Jiang. 2015. Fast and robust parallel SGD matrix factorization. In

SIGKDD . ACM, 865–874.Daniel B Perry, Bill Howe, Alicia MF Key, and Cecilia Aragon. 2013. VizDeck: Streamlining exploratory visual analytics ofscientific data. (2013).Xin Qian, Ryan A. Rossi, Fan Du, Sungchul Kim, Eunyee Koh, Sana Malik, Tak Yeon Lee, and Joel Chan. 2020. ML-basedVisualization Recommendation: Learning to Recommend Visualizations from Data. arXiv:2009.12316 (2020).Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In

Rec. Sys.handbook . 1–35.Ryan A. Rossi and Rong Zhou. 2016. Parallel Collective Factorization for Modeling Large Heterogeneous Networks. In

Social Network Analysis and Mining . 30.Steven F Roth, John Kolojejchick, Joe Mattis, and Jade Goldstein. 1994. Interactive graphic design using automatic presentationknowledge. In

CHI . 112–117.Sunita Sarawagi, Rakesh Agrawal, and Nimrod Megiddo. 1998. Discovery-driven exploration of OLAP data cubes. In

Extending Database Tech. ersonalized Visualization Recommendation 1:37

Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2016. Vega-lite: A grammar of interactivegraphics.

IEEE Transactions on Visualization and Computer Graphics

23, 1 (2016), 341–350.Carla Schenker, Jeremy E Cohen, and Evrim Acar. 2021. An optimization framework for regularized linearly coupledmatrix-tensor factorization. In . 985–989.Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Autoencoders meet collaborativefiltering. In

Proceedings of the 24th International Conference on World Wide Web . 111–112.Jinwook Seo and Ben Shneiderman. 2005. A Rank-by-Feature Framework for Interactive Exploration of MultidimensionalData.

Information visualization

4, 2 (2005), 96–113.Bracha Shapira, Lior Rokach, and Shirley Freilikhman. 2013. Facebook single and cross domain data for recommendationsystems.

User Modeling and User-Adapted Interaction

23, 2-3 (2013), 211–247.Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya Parameswaran. 2016. Effortless Data ExplorationWith zenvisage: An Expressive and Interactive Visual Analytics System. arXiv preprint arXiv:1604.03583 (2016).Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In

WWW .327–336.Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective matrix factorization. In

KDD . 650–658.Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query, analysis, and visualization of multidimensionalrelational databases.

TVCG

8, 1 (2002), 52–65.Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. 2012. Cross-domain collaboration recommendation. In

Proceedings of the 18thACM SIGKDD international conference on Knowledge discovery and data mining . 1285–1293.Manasi Vartak, Silu Huang, Tarique Siddiqui, Samuel Madden, and Aditya Parameswaran. 2017. Towards visualizationrecommendation systems.

ACM SIGMOD

45, 4 (2017), 34–39.Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. In

Proceedings of the VLDB Endowment InternationalConference on Very Large Data Bases , Vol. 8. NIH Public Access, 2182.Xueting Wang, Yiwei Zhang, and Toshihiko Yamasaki. 2020. Earn More Social Attention: User Popularity Based TagRecommendation System. In

WWW .Leland Wilkinson, Anushka Anand, and Robert Grossman. 2005. Graph-theoretic scagnostics. In

IEEE Symposium onInformation Visualization . 157–164.Leland Wilkinson and Graham Wills. 2008. Scagnostics Distributions.

Journal of Computational and Graphical Statistics

Information Visualization

9, 1 (2010), 47–69.Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager:Exploratory analysis via faceted browsing of visualization recommendations.

IEEE transactions on visualization andcomputer graphics

22, 1 (2015), 649–658.Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2016. Towards ageneral-purpose query language for visualization recommendation. In

Workshop on Human-In-the-Loop Data Anal.

Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe,and Jeffrey Heer. 2017. Voyager 2: Augmenting visual analysis with partial view specifications. In

Proceedings of the 2017CHI Conference on Human Factors in Computing Systems . 2648–2659.Xuhai Xu, Ahmed Hassan Awadallah, Susan T. Dumais, Farheen Omar, Bogdan Popp, Robert Rounthwaite, and FarnazJahanbakhsh. 2020. Understanding User Behavior For Document Recommendation. In

WWW . 3012–3018.Mao Ye, Peifeng Yin, Wang-Chien Lee, and Dik-Lun Lee. 2011. Exploiting Geo. Influence for Collaborative Point-of-InterestRecommendation. In

SIGIR .Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, SVN Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: Non-locking, stOchasticMulti-machine algorithm for Asynchronous and Decentralized matrix completion.

VLDB

7, 11 (2014), 975–986.Yongfeng Zhang, Qingyao Ai, Xu Chen, and W Bruce Croft. 2017. Joint representation learning for top-n recommendationwith heterogeneous information sources. In

Proceedings of the 2017 ACM on Conference on Information and KnowledgeManagement . 1449–1458.Cheng Zhao, Chenliang Li, Rong Xiao, Hongbo Deng, and Aixin Sun. 2020. CATN: Cross-domain recommendation forcold-start users via aspect transfer network. In

SIGIR . 229–238.Fan Zhou, Ruiyang Yin, Kunpeng Zhang, Goce Trajcevski, Ting Zhong, and Jin Wu. 2019. Adversarial point-of-interestrecommendation. In