[PDF] DVP: Data Visualization Platform

Abstract

We identify two major steps in data analysis, data exploration for understanding and observing patterns/relationships in data; and construction, design and assessment of various models to formalize these relationships. For each step, there exists a large set of tools and software. For the first step, many visualization tools exist, such as, GGobi, Parallax, and Crystal Vision, and most recently tableau and plottly. For the second step, many Scientific Computing Environments (SCEs) exist, such as, Matlab, Mathematica, R and Python. However, there does not exist a tool which allows for seamless two-way interaction between visualization tools and SCEs. We have designed and implemented a data visualization platform (DVP) with an architecture and design that attempts to bridge this gap. DVP connects seamlessly to SCEs to bring the computational capabilities to the visualization methods in a single coherent platform. DVP is designed with two interfaces, the desktop stand alone version and the online interface. To illustrate the power of DVP design, a free demo for the online interface of DVP is available \citep{DVP} and very low-level design details are explained in this article. Since DVP was launched, circa 2012, the present manuscript was not published since today for commercialization and patent considerations.

Full PDF

11 DVP: Data Visualization Platform

Waleed A. Yousef a,b , Senior Member, IEEE,

Ahmed A. Abouelkahire c , Omar S. Marzouk c ,Sameh K. Mohamed,Mohammad Alaggan a,b , Two snapshots for DVP. Left: the DVP integrated locally to Matlab on the desktop and acts as a Matlab toolbox, where all variables are communicatedfrom/to Matlab. Right: the online version of the DVP where all actions, interactions, and dynamics can be performed. (cid:70)

Abstract —We identify two major steps in data analysis, data explorationfor understanding and observing patterns/relationships in data; and con-struction, design and assessment of various models to formalize these re-lationships. For each step, there exists a large set of tools and software. Forthe ﬁrst step, many visualization tools exist, such as, GGobi, Parallax, andCrystal Vision, and most recently tableau and plottly. For the second step,many Scientiﬁc Computing Environments (SCEs) exist, such as, Matlab,Mathematica, R and Python. However, there does not exist a tool whichallows for seamless two-way interaction between visualization tools andSCEs. We have designed and implemented a data visualization platform(DVP) with an architecture and design that attempts to bridge this gap. DVPconnects seamlessly to SCEs to bring the computational capabilities to thevisualization methods in a single coherent platform. DVP is designed withtwo interfaces, the desktop stand alone version and the online interface. Toillustrate the power of DVP design, a free demo for the online interface ofDVP is available [1] and very low-level design details are explained in thisarticle. Since DVP was launched, circa 2012, the present manuscript wasnot published since today for commercialization and patent considerations.

Index Terms —Data Visualization, Scientiﬁc Computing, Data Analy-sis, Graphics Interaction, Dynamic Plots

Yousef, Waleed A., is an associate professor, [email protected] A. Abouelkahire, B.Sc., Senior Data Scientist, TeraData, Egypt,[email protected] S. Marzouk, M.Sc., School of Computer Science and Communication(CSC), KTH Royal Institute of Technology, Sweden, [email protected] K. Mohamed, M.Sc., Insight Center for Data Analytics, National Univer-sity of Ireland, Ireland, [email protected] Alaggan, is an assistant professor, [email protected] a Human Computer Interaction Laboratory (HCI Lab.), http://hciegypt.com/,Egypt. b Computer Science Department, Faculty of Computers and Information, HelwanUniversity, Egypt c NTRODUCTION

Data acquisition is ubiquitous; and data arise from diverse areas andapplications, including medical, ﬁnancial, industrial, governmental,among others. Data of size n × p consist of n records/observations, andeach consists of p dimensions/features. When p increases dramaticallydata is high dimensional. An example is DNA microarray data wherethe number of observations (here are patients) is in the order ofhundreds, while the number of dimensions (here are genes) is in theorder of thousands. When n increases dramatically data is called “big”.An example is astronomical data where the number of observationsreaches billions!Regardless to the origin of data or its application—data analy-sis, including statistics, statistical learning, machine learning, andpattern recognition, collectively are concerned with understandingdata, recognizing patterns, and learning input-output relationshipshiding in data. Modeling such a pattern/relationship can be describedby a regression function, classiﬁcation rule, clustering analysis, ormere statistical testing and summaries. This modeling is used forprediction (or decision support), and interpretation. Two steps usuallyare involved:1) Data exploration and visualization for understanding and observingpatterns/relationships in data. This step involves visualizing datain many interactive and dynamic plots. Each plot conveys part ofthe story, which is emphasized by the interaction with each plotand linking among different plots; [see, e.g., 2–7]. The term “DataVisualization” is used interchangeably with “Exploratory Data Anal-ysis” (EDA) and recently more fashionably “Visual Data Mining”; allconvey the meaning and objective behind such a step.2) Construction, design, and assessment of the model to formalizethese patterns/relationships and to assess their statistical signiﬁ-cance and generalization to the population of data; [see, e.g., 8–13].Each of these two steps accounts for a ﬁeld by itself with its ownliterature, theory, and software. Although inclusion of the two steps a r X i v : . [ c s . G R ] A ug results in a consolidated design and great understanding of data, notall practitioners adopt such a comprehensive view when analyzingdata. The need for the ﬁrst step, data visualization and exploration,becomes more crucial when data become high dimensional (huge p ) or become “big” (huge n ). This is true since modeling, analyzing,and processing data with huge n or/and p become more difﬁcult andcomplicated. Data visualization and exploration reveal secrets andpaves shortcuts to understanding data and building best models. Many data visualization software (DVSW) exist that can producesimilar results with the capabilities of “interaction” and “linking”,which are not supported by any Scientiﬁc Computing Environment(SEC) as Matlab, Mathematica, SAS, etc. Then, the right question isthis: what is the need for another DVSW, and why do we propose ourData Visualization Platform (DVP)? We provide below, in bullets, ananswer for this question and show how DVP design and philosophyis important to scientists, researchers, and data analysts in differentﬁelds. Although not all of the following aspects are currently imple-mented, but the DVP kernel is designed with eyes on the following: • Seamless communication with any SCE to behave as a singleenvironment.

Current DVSWs are standalone software that aredetached from SCEs. Any scientist, researcher, or data analyst usingany SCE cannot interact with patterns visualized and discovered inthe DVSW. For example, if the data analyst uses Matlab to analyzea dataset, and Parallax to visualize data, he cannot do processingon patterns discovered in Parallax; these patterns are not, of course,seen as variables in Matlab workspace. It is impossible to iterateback and forth between the DVSW and the SCE except by tediousdata export and import that puts hurdles. DVP is designed tointeract seamlessly with any SCE, as if both are one environment,even if DVP and SCE are running on two different machines. Thisis extremely important for connecting with computing clouds foranalyzing big data. • High extensibility to different plots and methods in various sci-entiﬁc ﬁelds.

Current DVSWs provide some visualization methods,e.g., ||-coords, scatter plot, matrix plot, projection pursuit, grand tourin 4 dimensions, etc. However, many scientiﬁc ﬁelds require moresophisticated methods. For example, graph analysis and astronomi-cal data require Multi-Dimensional Scaling (MDS) plots [see, e.g., 2,II.6]. It is almost impossible for any DVSW to provide all the availableplotting and charting methods, let alone ones being continuouslydeveloped from many ﬁelds of science. DVP, in addition to the widerange of plots it provides, it is designed to provide an easy scriptinglanguage based on JavaScript that enables users to write their ownplotting methods and integrate them to DVP. This will build a wideruser community and enrich it with many sophisticated methods. • Support data from network streams and common local databaseservers, e.g., SQL, MySQL, and Oracle.

Many DVSWs only load datastatically from a local machine storage. However, nowadays, manydata sources belonging to many applications are available onlineand updated in real time; e.g., stock market data, data of globalenterprises, Yahoo data, etc. Analysts monitoring such data have tobe connected all the time. DVP is designed to facilitate connectionto network streaming and different online database sources. • Available API for interfacing with different hardware, e.g., Rasp-berry Pi and Arduino chips.

Data acquisition is not explicit tosoftware and reports; data are acquired from hardware as well, e.g.,Arduino and Raspberry Pi chips. Arduino [14] is a micro controllerdesigned with the objective to connect to the ambient; a chip hasdifferent sensors for humidity, light, and moisture, etc. Raspberry Pi[15] is a credit-card size computer; yet it is so simple that anyonecan program it. DVP is designed to provide API for interfacing withhardware devices. • Cross platform compatibility, e.g., Windows, Linux, Mac, and iOS.

In contrast to many available DVSWs, DVP is designed to operateacross different operating systems. • Multi-device rendering support, e.g., touch screens, big datadisplays, dashboards, and interactive PDFs

Many DVSWs renderonly to desktop screen, they are not designed for displaying bigdata on large displays, although there is a demand to render datato large displays as we have entered already the era of big data.DVP is designed to render to small and large displays and to receiveinput from touch devices as well for wider user community andcommercial needs. DVP is designed to offer business solutions,as well, for enterprises by supporting web-based dashboards andonline visualization. In addition, DVP is designed to produce inter-active PDF documents by exporting ﬁgures and plots to PDF withthe capability of interacting with those ﬁgures in the PDF documentitself. This integrates reporting schemes to interactive graphics forportability and wider utility. • Customizable ﬁgures and plots.

As opposed to many DVSWs, DVPis designed to provide full customization to its ﬁgures and plots.Moreover, the design concept behind DVP is that every activity isa result of a function call with passed parameters. The GUI actionsof the DVP do nothing but calling those functions. This means thatusers can create whatever plots, ﬁgures, new methods, and fullycustomize them with the provided scripting language.Table 1 is a more quantitative comparison between the ﬁrst versionof DVP and other well known software available in the market foreither data visualization or scientiﬁc computing. The comparison isestablished on 2013 version of all of them, when DVP was launched.It is clear that the majority of aspects important to a completevisualization system are missing in the available systems. DVP isconcerned with providing all of these technical features and aspectsin one platform. In Section 2.4, below, we elaborate more on thosefeatures and detail each of them.

The rest of this paper is organized as follows. Section 2 is a high-leveldesign aspects, requirements, and features of DVP. Sections 3–6 detailthe design and architecture of DVP components and subsystems. Toclarify the power of the DVP kernel design, we had to present in thesesections some very low-level technical details at the level of variablesand processes. Appendix A is a very short account and tutorial on datavisualization and graphics, taken almost from textbooks, that includeshistory, importance and motivation for exploratory data analysis andGrammar of Graphics (GoG). It is important for a reader who is notfully acquainted with the ﬁeld; however, it can be trimmed out fromthe manuscript without affecting its coherence.

HIGH LEVEL DESIGN ASPECTS

In this section we provide the high level design aspects and philosophyof DVP; some of these aspects are implemented and others are still un-der development. Even those aspects that are still under developmentare taken care and accounted for the internal design and architectureof the whole system. For a full account of the implemented aspectsthe reader may refer to the technical manual of the DVP.

In this project of designing a data visualization software, we adopted avery sophisticated design of scientiﬁc plots and ﬁgures. It is based onthe so called ‘Grammar of Graphics” (GoG) that was proposed ﬁrst by[16], and adopted, e.g., in the R package for static plotting ggplot2 [17]. The power of that very formal approach of describing plots andﬁgures lies in its generality and in its procedural design. With GoG,graphics are not a simple render of colored points on a planar area.Rather, with GoG there is a formal language to describe graphics atsome level of abstraction. This is so powerful a tool for describingnew plotting methods or modifying existing ones, which is ideal forour general purpose DVP that provides a scripting language for codingnew plotting methods. For more information on GoG the reader myrefer to [16].

TABLE 1Comparison of popular visualization software packages. ’x’ denotes full support and ’-’ denotes partial support.

Package SCE support Interactivity Linked Plots Extensibility Cross Platform Figure Customisation Database Servers Mutliple devicesMatlab - - - x - xMathematica - - - x x -SigmaPlot - -OriginLab - -Tablue - - - - - x xPV-Wave x - - - xPlot.ly - - - - - xParallex - x - - -RGgobie - x - -DVP x x x x x x x -

Fig. 1. DVP system architecture; (arrows indicate dependency)

There are many important plots for visualizing data that requirerigorous mathematical treatment or algorithm design; e.g., Parallel Co-ordinates (||-coords), Projection Pursuit (PP), Force-Directed Graphs(FDG), Multi-Dimensional Scaling (MDS), among others.One example is ||-coords, where designing useful data groupselections (queries) requires good knowledge of the geometry ofhigh dimensions in ||-coords. Useful mathematical queries have tobe designed for selecting observations of particular interest, e.g.,observations with particular slope, observations with some correlationcoefﬁcient, etc. Having such queries in DVP enables us, e.g., to interactmore smartly with the ||-coords plots. For more information on ||-coords the reader may refer to [18].In addition, algorithm design is needed for some quiries. Forexample when selecting some data of interest between two parallelaxes in ||-coords. This sounds trivial at the ﬁrst glance; however,between the axes one has to search where the GUI selecting tool(e.g., mouse) is moving and intersecting with drawn lines. Brute-forcesearch is disastrous if not impossible for large data. Efﬁcient heuristicsearch algorithms have to be designed, not to sacriﬁce selectionaccuracy for performance optimization and rendering speed.

Fig. 2. Shell application components

DVP is a complex system that provides a various set of innovativefeatures, and these features depend on a wide set of modules andtechnologies that construct the main architecture of DVP. As sketchedin Figure 1, the architecture design of the DVP system consistsof separate cooperating parts (subsystems) that work all togetherto achieve system functionalities. These parts are responsible forproviding the support for cross platform compatibility, multi-devicesupport, communication with SCEs, and rich interactive visualization.The architecture of each subsystem is discussed below with a shortdescription of its functionality, structure, and provided features.

Shell application, as sketched in Figure 2, is a fully functioning webbrowser that runs as a container and as a host for web content, whichaccounts for a perfect way to host HTML based applications nativelyon heterogeneous operating systems supported by the browser. Thisdesign can be found in applications like Atom IDE [19], AdobeBrackets [20], and LightTable [21]. This shell application consists oftwo main subparts on Figure 2. The ﬁrst is a platform dependentnative UI component, which appears in the Figure as Windows Formsfor Windows, GTK for Linux, and Cocoa for Mac. The second isChromium Embedded Framework (CEF), which appears in the Figureas Blink [22], NativeUtil, and V8 [23]. The ﬁrst is responsible for crossplatform compatibility and the second is responsible for accessingnative resources, linking third part modules, and performing ﬁleoperations.

A server module, as sketched in Figure 3, acts as a middle agentbetween DVP and SCEs. This server module uses web sockets [24]and XML HTTP requests to establish communication between DVPand SCEs, by providing an interface for communication with system.

Fig. 3. Communication server architectureFig. 4. UI application components

Since the server module is using the common web sockets, any SCEcan be seamlessly integrated with DVP by implementing a modulethat uses DVP interface provided by the system module.The server acts as a bridge providing an interface for the DVPvisualization and conﬁguration functionalities to any other software.It even can communicate with regular programming languages likeC++, Java and C

UI application, as sketched in Figure 4, is implemented as a webinterface using a set of web tools and a combination of the state-of-the-art work of leading companies and technologies, e.g., Google,Twitter, and D3 [25]. The later is a newly developed library for dynamicand interactive web content. UI application will be hosted by the shellapplication and they will act together as a single running applicationproviding compatibility on different operating systems and devices.

A very large display subsystem to explore big data is to be designed.System architecture is shown in Figure 5. This system projects data onan area (a wall or any white screen) covered by 4 parallel very highdeﬁnition projectors. This area is estimated to be up to 100 m . Sincedata is huge, rendering is divided into 4 quadrants each is processedby a giant GPU and fed into a separate projector. The whole renderedview is complied back by the 4 projectors to the very large displayscreen. Fig. 5. Architecture of integrated subsystem solution for big data visualiza-tion.Fig. 6. Big data is visualized on two displays simultaneously; one very largedisplay for projecting the whole data, and another large touch screen forinteracting with a portion of the data of particular interest.

Since, it is almost impossible to interact with data on that verylarge screen, another large touch screen (80 inches) is connected tothe system. If the analyst is interested in some portion of the datadisplayed on the wall he can select it for interaction on the largetouch screen, which can be thought of as a magnifying glass in imageprocessing softwares (Figure 6).

In this section, we provide and explain all the planned features thatthe architectural design of DVP supports; we provide them in terms ofuser stories.

DVP will provide a simple way to establish communication withscientiﬁc computing environments in order to make a link betweenthe process of data analysis and data visualization, and also tofacilitate reviewing the interpretation. A list of sub-features: • System can be attached and detached from SCE. • Connection with single or multiple instances of SCEs. • Connection can be made to SCEs within the same machine, overLAN, or even a remote machine.

System has a set of features related to data manipulation that can belisted as follows: • Data imported will be categorized into one of three types: quanti-tative, categorical, or ordered categorical; this early categorizationwill help providing suggestion about the set of ﬁgures and plotsthat visualize data the best. • DVP will support most common data format used by dataanalysts like CSV, XML, JSON or SGML. • Data can be grabbed from online sources like Google drive, socialmedia analytics, stock market, or even other storage types, e.g.,SQL or excel sheets. • System will facilitate processing acquired using basic set ofoperations like merging datasets from various origins.

DVP provides a various set of ﬁgures and innovative visualizationmethodologies which have been a result of research in the ﬁeld ofdata visualization. A list of sub-features: • System will provide UI wizard to ﬁll ﬁgure required parameters. • Figures will have a UI selection way to create, modify, and deletegroups of ﬁgure objects. • Selection groups can be created using a mapping function to thedata source. • Annotations can be made to the ﬁgure itself or to one of itsobjects. • Annotation will include creating arrows, circles, and polygons. • Figure view can be transmitted to SCE.

Making DVP open for extensions will provide users with the powerof implementing their own extension that can help them with theprocess of visualizing or interacting with data, and build widercommunity that uses DVP. This is achieved by providing a scriptinglanguage to facilitate a set of key features: • Users can develop custom plots and it can be embedded to DVP. • Also users will be able to implement transformation function thattransform data before visualization. • Users will be able to develop their own parameterized visualiza-tion methods with custom interactive behavior. • Users can also implement post processors that can modify theresult of visualization. • Also all customizable visualization methods will be able to takecustom pre and post callbacks as parameters. • Figures will be reproducible even from the SCE by calling thewritten script.

Visual interaction with plots and models is a key part of visualizationtechniques. DVP provides a set of animated and interactive visual-ization methods, and most importantly provides to users a scriptinglanguage, as mentioned above, to produce their own interactivebehavior. This feature is divided into: • UI scroll bar can be used to control thresholds in different visualmethods that make use of them. • User can animate data points up to 2 million points in a non-interactive movie. • User can use automatic or manual calibration to change con-strains of data set sizes considered big or small.

DVP features related to UX can be listed as follows: • DVP will be cross platform, running on the three known operatingsystems Windows, Linux, and Mac. • DVP will be able to run on most of tablets and smart phones. • DVP can run on multi-touch screens. • DVP will support running on multiple screens with providing away to manage distributing ﬁgure among various screens. • DVP will provide the ability to toggle between free and dockedﬁgure. • DVP will be able to use pre deﬁned set of hand gestures as ameans of input.

Fig. 7. DVP architecture and design

DVP will provide methods to export visual models into the followingforms: • Images with raster format supported by imagemagick. • Images with SVG vector format. • Interactive PDF document to encapsulate visualization with re-porting. • Print a single ﬁgure or multiple ﬁgure to any paper size.

The system will access web to do the following tasks: • Store user session information so that user can reload this sessionlater from any machine. • Save and share current visual models with colleagues or partnersusing web as if it is a web post. • DVP will provide creating both local-machine and online busi-ness dashboards with multiple interactive linked ﬁgures. • Dashboards created by DVP can obtain data from web or localdata sources. • Importing data from SQL storage servers to dashboards can bedone with a UI wizard without writing any SQL code for theconvenience of non-technical oriented users.

LOBAL S YSTEM A RCHITECTURE AND D ESIGN

The main target of the design is to create an extensible, easy touse, beautiful visualization platform that integrates seamlessly with allScientiﬁc Computing Environments (SCEs). The current DVP designconsists of a Plugin written in Java for the intended SCE, a web serverthat uses JSON serialization, and the DVP itself built with

Web tech-nologies running on CEF (Figure 7). In this section, we ﬁrst introducethe technologies used for the rest of the paper (Section 3.1). Then wego over several design alternatives and approaches to document somedesign decisions that we have taken to ﬁnally reach this design of DVP.

D3 [26] is a Javascript library created by Mike Bostock [27] forvisualization. It implements a Data-Driven Document model whichworks by associating data with DOM elements, in order to facilitatecommon visualization tasks such as selection, manipulation, additionand deletion of data points. It implements the model using a syntax similar to that of jQuery [28] which facilitates chaining several actionson a data set in one line call. This “chaining” is not just syntax sugar,in fact it has a large effect on performance because browsers canoptimize consecutive rendering/relayout/repaint/restyle calls [29].D3 uses SVG standards introduced with HTML5 for drawing ele-ments. SVG primitives are represented as DOM elements, which allowsfor using any existing Javascript/CSS libraries to manipulate/style thevisualization elements. As an example, attaching an image to a datapoint represented as a circle in a scatter plot would be as easy as justadding an tag to the circle DOM. Modern browsers can be seenas powerful and efﬁcient rendering engines which is a fact that D3leverages. However, creating so many DOM elements causes problemswith visualizations requiring more than 200,000 elements as seen onour hardware and benchmarks; however, rendering performance doesscale with hardware.

Google Chrome is a modern web browser built by Google. It ismulti-threaded and has powerful Javascript and rendering engineswhich leverage hardware acceleration through GPUs. Google Chromeis based on an open source project called Chromium [30]. There existsa sub-project called Chromium Embedded Framework [31] whichis a framework for embedding Chromium-based browsers in otherapplications. CEF is available as Dynamic Link libraries which onecan use to build applications. A new version of CEF is automaticallygenerated for every new version of Chromium through automatedcode extraction scripts. DVP is built using CEF.Chrome, Chromium, CEF, all share the same codebase, and thesame Javascript/rendering engines, they are in fact at the core, thesame thing. They only differ in GUI; Chrome comes with Gmailaccount synchronization and some other Google services. Chromiumhas very similar UI but with some few features stripped out. CEF isa DLL. The GPU accelerated compositing [32] of Chromium and itspowerful rendering engine, Blink [33], are the reason we currently useCEF.

We believe that a client-server model should sufﬁce for the needs ofour system. The SCE acting as the server, and a separate process calledDVP which will run the visualization code acting as the client. Bothprocesses would then communicate through a serialization interfaceor Remote Procedure Call (RPC). Following are some aspects to beconsidered:1) Due to the different nature of each SCE in terms of availabledatastructures, language and features, a special component orplugin should be built for each one but have a uniﬁed interfacewith a uniform serialization speciﬁcations to reduce the SCEspeciﬁc code required in the DVP component. This componentshould be built with maximum re-usability in mind to avoidredundant design and coding.2) The DVP as a platform should be built with extensibility and re-usability in mind to account for the different scenarios of com-munication with different SCEs and user-added features/ﬁgures.

SCE plugins can be written in different languages depending on eachSCE design. But for the 3 main SCEs we target, namely, Matlab,Mathematica and R, it can be written in:1) The SCE Language itself.2) C++.3) Java.Writing the plugin in the SCE language itself means having to rewrite itfrom scratch for every SCE; therefore having a high development anda high maintenance cost, and in some cases, inferior performance toother 2 options. C++ is of course minimalistic and fast, however beingnative adds a high maintainability cost, which can be avoided withJava, without sacriﬁcing a lot of performance. Depending on the DVPlanguage and architecture, different serialization or RPC methods canbe used, each one has its pros and cons, however their impact thus faris not large. We discuss them and their effects on our current designin Section 6.1.

To build the DVP itself, we had to choose between different languagesand platforms. It can basically be built with anything, Java, C++,Javascript or even one of the SCEs. What follows is a discussion ofeach platform and what it has to offer.3.3.2.1 Java: has a cross-platform GUI framework, howeverits performance in visualization is weak. Therefore to build a visual-ization tool using Java would require the use of OpenGL, or any Javalibrary or framework building upon it. There exists a programminglanguage with a small platform built for it for visualization usingJava called Processing [38]. It was developed by MIT media labs. Eventhough it has a lot to offer, it was found that building unique interfaceswith rich features would require a lot of development effort using it,as compared to other platforms like D3. Also, there exists a Javascriptport for the language.3.3.2.2 C++: is needless to say it offers the best performancewhen it comes to speed and efﬁciency. However, both the devel-opment effort required to build the visualization library, the richinterfaces mentioned in the user-stories, and in order to offer the levelof extensibility required, it would be very expensive and difﬁcult toboth design and build. However C++ may be considered for visualizinglarge data samples or integrate with native components or librariesneeded by our system later on.3.3.2.3 SCE Language: One could consider building the vi-sualization platform using one of the free SCEs, like R for instanceand taking advantage of the familiarity of the users with it, and takeadvantage of the primitives already existing. However, this would limitthe scope of the product to only scientiﬁc applications. Also this wouldlimit the user-base to only those who know that speciﬁc SCE and limitthe performance of the visualization algorithms to that of the SCE’sGUI rendering engine.3.3.2.4 Web technologies: The introduction of SVG standardsin HTML5 and the evolution of libraries such as D3, has givenJavascript a large set of capabilities when it comes to visualization.Modern browsers are very powerful rendering engines which enablesJavascript to render large datasets with ease and take advantage ofhardware acceleration. Javascript is also arguably the most popularprogramming language, which means a very large user base; and sinceJavascript is an interpreted weak typed language, extensibility designwould be very easy. The only shortcoming of this approach is theaccess to native libraries such as OpenCV, or those needed to integratewith certain hardware like Arduino for instance. However, Chromiumand its embedded framework provide a very feasible solution for thisissue by allowing for Javascript to execute C++ functions throughV8 and retrieve data in a native Javascript format. This means thatJavascript can access any memory available to C++. Another problemis handling massive data. By design, Javascript arrays can only have32 bit indices; this means it cannot handle data with more than 2billion points. But for such data, a native visualization library wouldbe needed anyway for special visualization. Therefore, a solutionwould be to have 2 modes: a full interactive mode and a simpleheavy visualization mode, where the heavy visualizations would bedone through a separate native libraries with specially optimizedalgorithms.

If Web technologies will be used as a platform for DVP, this needs afurther comparison among Javascript/render engines. This compari-son revealed that webkit is by far a head in performance if comparedto gejko.

According to what is mentioned in Section 3.3.3, we choose Webkitand v8 to be the DVP engines. This give us another set of options tobe compared below.3.3.4.1 Google Chrome: pros • no build problems. • more portable. cons • no shared memory. • no advanced integration with javascript since it is closed boxrun on the web browser. • cannot handle big data because of lake of control over it. • no security. • restrictions on javascript access.3.3.4.2 Chromium: pros • every con in Google Chrome, above, can be alleviated here. cons • maintenance problem with the update as we must removegoogle things every time to be up to dated. • cross platform build requirements.3.3.4.3 CEF: pros • every con for Google Chrome, above, can be alleviated here. • no maintenance problem. cons • cross platform build requirements.3.3.4.4 CEF variants: Many development platforms have beenbuilt around it, some for desktop development, some for server de-velopment, and even some for mobile development. We have decidedto go with native simple CEF; however below we keep the discussionand list other forks that we considered.3.3.4.5 Crosswalk [39]: is maintained by Intel. It is mainlyintended for mobile development and that is where their maindevelopment and support goes; therefore it is a viable option whendeveloping the mobile interface.3.3.4.6 Awesomium [40] & Tide SDK [41]: Awesomium is acommercial library, built on top of CEF; it offers an SDK with somebuilt-in functionality. It is just like TideSDK; however building on topof such libraries exposes us to bloat, less ﬂexibility in design, andrestrictions. E.g., we will be neither able to update nor develop usingnew features of CEF as soon as they come out (see Section 4.3 for areal usecase).3.3.4.7 NodeJs [42]: is a headless V8 engine with no renderingengine such as webkit. Of course this cannot be used directly forvisualization; however NodeJs, can act as a very strong server withrich access to native libraries since NodeJs already integrates with alot of native libraries. However, integrating it with CEF is very difﬁcultand done by applying patches to direct9 source code such as cefodeproject [43].3.3.4.8 Node Webkit [44]: integrates NodeJs with Chromium.Unfortunately, Node Webkit has its own “packaging” system and yourapp is “packaged” into a special zip format, and it has its ownGUI library that builds on top of that of Chromium. Modifying itwould be a lot of work. Using most of the libraries munitioned abovewould offer little help with the DVP development; however the costof maintainability and control over CEF components and updatingwould be too high. That is why we resorted to using plain CEF sinceit meets all the DVP design philosophy aspects. HROMIUM E MBEDDED F RAMEWORK (CEF)

This section discusses CEF and all its related issues in depth. Thissection provides many technical details that seems to be out ofscope of the present article. However, we would like to illustratethem because they very much relate to the high level objectives andphilosophy of DVP. So, down the rabbit hole we go.

The CEF project is overall poorly documented; however, project wiki[45] is a good place to start. Some tutorials, e.g., [46, 47], providea good introduction to most of the functionalities provided by CEF.Some forums, e.g., [48], is also a good resource for help. However, fortoo many hits questions are left unanswered.Upon downloading and extracting the binary packages, the sampleapplication can be built using the provided build.sh ﬁle on Linuxor using the sln ﬁle on Windows. These project ﬁles are automaticallygenerated through “Generate Your Project” (GYP) [49] scripts; editingthese ﬁles is quite tedious. Also, editing the GYP ﬁles is not possible without downloading almost all of Chromium source. For thosereasons, we decided to build our own build scripts using

CMake [50]to allow for consistent cross-platform build.

CEF comes with two sample applications: CefSimple and CefClient.The ﬁrst is a minimalistic window with a single frame. The secondimplements almost all of CEF features. We decided to build upon theCefSimple instead of stripping out the CefClient.Both of CEF sample applications have 3 versions; one for eachof Windows, Mac and Linux. It uses native window toolkits on eachof those platforms. We wanted to build a cross-platform application;and doing so requires building the GUI 3 times. We know that, byDVP design, the UI will mostly rely on Javascript; however, in thefuture when adding heavier methods that require, e.g., OpenGL, wemight need to build some native controls outside the CEF frame.Therefore, we decided to use wxWidgets, which is a library builtas an abstraction layer on top of native libraries of each platform:windows forms for Windows, GTK for Linux, and Cacoa for Mac.However embedding CEF still requires a native applet, e.g., gtk_vbox,which required digging inside wxWidgets and ﬁnding a control thatexposes such an applet and extracts a reference to it. This is discussedfurther in Section 4.4.1.Also, for some unknown forsaken reason, CEF includes its headerﬁles using relative paths, i.e. quotation. This destroys any dreams ofbuilding outside the directory of CEF since it requires maintaining thehierarchy where the includes are located. As a solution, we createda CMake ﬁle in intended directory, and built everything relative tothat path. However, CMake supports out of source directory build, soeverything is built into __ build . After CEF is embedded into wxWidgets, we wanted to integrate it withother libraries such as OpenCV to be accessible from Javascript. Thisrequired delving into a larger problem, which is encodings. Javascriptis not designed to handle binary data. Although typedarrays [51]do exist in Javascript and are supported by all of Chrome variantsincluding CEF, which does not expose an interface for creating thesearrays from C++ [52]. The best way to pass binary data, as advisedby Google support, is surprisingly by creating an XHTML request, andcatch it [52]. The available options for passing binary data are through: an XHTML request, which is the worst option in terms of perfor-mance, design and implementation. encoding into strings, which is implied by the ﬁrst option. It issimilarly bad in performance but not in design. converting everything into normal ﬂoat arrays, then passing thatinstead. This is feasible, with acceptable performance. However,the overhead of copying data can sometimes be unacceptable ifthe data size is large. direct Proxies [53], which is a sort of overloading the Javascript [] operators using a C++ function; i.e. when arr[0] is executed,a C++ function is called with the parameter “index” with value0, so we can really make Javascript access any block of memoryavailable to C++ using pointers. Now, Javascript arrays are C++pointers. That is really cool, but not so fast!Proxies are a part of ECMA 6, and that is not fully implemented norsupported in Chrome variants at the time of releasing the ﬁrst versionof DVP [54]. However, incomplete support for it is provided through anexperimental feature, which is turned on by passing the -harmony ﬂag to V8 and using a library [55] to account for the missing features.However, this should be ﬁxed in later versions after the launch ofECMA 6. As mentioned above, wxWidgets is an abstraction layer built ontop of each platform native window toolkit. This means that usingwxWidgets requires using the native windows at the end of the day.However, if, at some point, a direct access is required into the nativetoolkit objects, a hack or extend is needed. This hacking/extending will be a platform speciﬁc code. Since hacking solution was not onlydisastrous but also quite tedious (since almost none of wxWidgetspanels uses gtk_vbox on linux), we decided to extend it. However,even that required a little hacking of its own since its extension isneither documented nor standard.

CEF requires a blocking call for its event and render loops to start;this call blocks until CEF is shutdown. This caused a problem withwxWidgets because it required the need for launching another threadfor that function to be called from. The optimal solution however isto integrate CEF loop with wxWidget render loop. wxWidgets starts itsrender loop using a macro; however, it can be overridden to let yourown render loop see Making render loop [56]. This required a lot ofwork. Rather, we proceeded by starting the CEF thread in an event call.Events are multi-threaded but safe(r) when interacting with wxWidgetGUI components [57]; until now, no complications have occurred.4.4.2.1 Extending wxWidgets on Linux: After diving into theabyss of wxWidgets implementation for linux, we emerged with thefact that there exists a variable named m_widget , which containedthe actual GTKWidget being used by any control that inherits fromwxControl. However, some solution is already available and posted onwxWidgets forums [58].4.4.2.2 Building Problems on Linux: wxWidgets is built usingAutoTools, then installed into a folder of choice with symbolic links tocompiled ﬁles. We wanted to specify the build to use the wxWidgetsbuild available in our repository instead of that coming with linux dis-tros. There exists a CMake variable called wxWidgets_ROOT_DIR ,which turned out to work only on Windows. In addition, it turnsout that wxWidgets comes with its own wx-config ﬁle which issimilar to pkg-config . Therefore, it is necessary to specify thisﬁle using the CMake variable wxWidgets_CONFIG_EXECUTABLE .Another issue is that wxWidgets adds an isystem ﬂag to the compilerparameters automatically for some reason. This of course destroys anyattempt to compile. The only way to disable it is setting the variable wxWidgets_INCLUDE_DIRS_NO_SYSTEM to true.

There are two ways to export OpenCV into Javascript: (1) by makinga full ﬂedged interface, with matrix objects that resembles OpenCV

Mat class (this is the way nodeJs OpenCV package does it), (2) or bymaking a limited set of functionality that is called through functions.By the time of releasing the ﬁrst version of DVP, we have only created alimited set of functions as a proof of concept. Encoding images is alsoone of the main issues with OpenCV, since Javascript can only handleimages encoded in base64 strings, which required using externallibraries to encode such images. However, an alternative solution [59]is to use a canvas to display images, and supply it with a typedarray.Also, this can possibly be mixed with a Proxy to avoid copying, whichwe never investigated yet.

AVASCRIPT A PPLICATION

DVP is a standalone application that we had planned to write usingweb technologies (HTML5, CSS/CSS3, and Javascript) and C++ as itsbackend using CEF integration. As we mentioned earlier, our generalarchitecture for DVP is a client-server model. This should not leadto the wrong conclusion that DVP cannot run without server sidecomponent. It should be very clear that client-server architecture hasbeen chosen to handle SCE integration with DVP; in addition, DVPcan run as standalone application without the need for any server sidecomponent. On the other hand, had we opted to use something likenodeJS as our backend, DVP would become client-server application.In this section we will compare the different Javascript frameworksand different ways for ﬁgure drawing; also, we will elaborate onarchitecture and on how user requirements are fulﬁlled.

HTML is great for declaring static documents; however, it falters uponusing for declaring dynamic views in web-applications. AngularJS allows for extending HTML vocabulary for applications. The resultingenvironment is extraordinarily expressive and readable with quickdevelop rate. Other frameworks deal with HTML shortcomings byeither abstracting away HTML, CSS, and/or JavaScript on a hand; or byproviding an imperative way for manipulating the DOM on the otherhand. Neither of these addresses the root problem of that HTML wasnot designed for dynamic views. AngularJS is a toolset for buildingthe framework that is most suited to application development. It isfully extensible and works well with other libraries. Every feature canbe modiﬁed or replaced to suit a unique development workﬂow andfeature needs. For more details check angular website [60]

The application-side architecture should not be confused of coursewith the architecture discussed in Section 3. There are three maincomponents of the application-side architecture:

User Interface (UI) should provide excellent user experience. Its im-plementation relies heavily on mastering: • Jquery, CSS, HTML. • Angular and data binding.

Core is related to how we mange the underlying layers of applicationlike validation, SCE integration, CEF integration, etc. Its imple-mentation relies heavily on mastering: • javascript. • Object Oriented Design (OOD). • dependency injection design pattern. • Angular and its services.

Figures is related to how the ﬁgures themselves, which the user willinteract with, are created. Their implementation relies heavily onmastering: • D3. • Jquery. • Angular • how render engines works. • data visualization foundations, e.g., geometry, probabilitystatistics, linear algebra, etc. This component should: • manage the underlying layers of the application that the UIreﬂects. • keep the system in consistent state by validating every input/out-put action and by providing both the data structure and logic tohandle this. • integrate with anything outside the javascript application, e.g.,SCE or CEF.Therefore, and because of the lack of classes and OOP in javascript,we had to take care of several technicalities as follows.5.2.1.1 Data Structure: Simple lists have javascript objects;each object has a deﬁnition for its ﬁelds and a unique ﬁled that isused to store/access/delete the object in the list. The unique ﬁeldname will be given as a parameter in the list constructor. In principle,although not needed so far, the object may have a ﬁeld that needs aspecial data structure. • object duplication. • deﬁnition for each object type; e.g., data-source object deﬁnitionis id, name, data, ColumnNames, ColumnTypes . • validation that each object related to speciﬁc type has the re-quired ﬁelds to be added in the system. • validation of required ﬁelds values.5.2.1.3 OOP: we need to reinvent almost everything in thisregards. Validation class has all the required functions to validatean object according to a given set of parameters.

List class, a servicein angular, can add/delete/modify/validate the Object to the list.This is in addition to some other utilities, e.g., checking for elementexistence. Its constructor takes object deﬁnition, key ﬁeld name, andlist name as parameters. A class X will inherit the class List thenadd/overload other functions if needed5.2.1.4 SCE integration: For integration with SCEs, we simplyneed to establish the two-way communication between DVP and theSCE. Therefore, we need to send commands from DVP (or SCE) to SCE(or DVP). As mentioned earlier we are following client/server modelto handle this communication using JSON objects passed betweenDVP and SCE. From DVP side we use AJAX functions from JQuery tohandle post requests and SSE to handle get requests. Alternatively,the angular service, HTTP, could be used; however, we found that theSCE plugin does not work with this approach, which may need moredebugging.

To receive commands from core architecture mentioned above, e.g.,to create a ﬁgure, it is only required to get information from theuser, construct it in a javascript object, and pass it to the coreto be validated and added. Therefore, to receive commands fromSCE, there will be a protocol to deﬁne the kind of operationneeded then pass the object constructed, by the SCE plugin(Section 6), from the user input to the related function in thecore; e.g., figure.add(constructedObject) . The protocolis also an object deﬁnition since it is a JSON object.

To send commands this is established via ajax post function, whichis very straightforward.

LUGIN

Since Javascript cannot handle binary data, some sort of serializationis necessary. However, serialization could deteriorate performancewith larger datasets. Several serialization methods are possible: • JSON and its binary variant BSON. • Google Protocol Buffers [61]. • Protocol Buffer’s Author’s Cap’n Proto [62]. • Previously Facebook, now Apache, Thrift [63]. • Apache Aciteve MQ [64]. • Shared Memory.Each option has its own pros and cons. The ﬁrst option, JSON, isthe one we currently use. It is the most straightforward and nativeto Javascript. In addition, we do not observe performance issues atthe moment; the time consumption is still acceptable at a matrixsize of 200,000 ×

50. Other libraries use speciﬁc formats; porting themto handle Matlab objects is cumbersome. On the other hand, Cap’nProto seems very promising since it does not serialize; rather, it justcopies bytes in a cross-platform manner.

We have used Jetty because it is minimalistic, small, extensible, andsufﬁcient for our purposes. The designed servlets architecture ismeant to be as much extensible and reusable as possible. Adaptingthese servlets for any SCE should be trivial as discussed in section6.2.2. For Matlab control, there are many approaches [65]. We havedecided to go with “MatlabControl” since it is simple, straightforward,and it works on existing opened sessions without the need to open anew session.

DVTWebServerInterface is the entry point of the component; it can be extended tomodify logic or add functionality for any SCE type if desired.E.g., we have extended it using

MatlabDVTServerInterface only to call the constructor with the $sceName parameterand to specify the server post service sse and to specifythe default $serializationType (json) since java dosenot support compiler directives.

DVTWebServerInterface initializes 4 servlets:

DVTNewDVTIdServlet , DVTSCEJsonServlet , DVTEventSourceServlet and

DVTEventSourceReplyServlet ; then maps them to theurls welcome , $sceName , sse , sse-reply . DVTNewDVTIdServlet is responsible for handshaking with DVPand sending conﬁgurations to it including the urls to which all otherservlets are mapped. It sends an object of

NewDVTIdMessage frommessaging serialized in $serializationType . It is also respons-bile for assigning IDs to DVP; however, this is done automaticallywhen creating a new

NewDVTIdMessage ; the ID of the message isthread-safe auto-incrementing.

DVTSCEJsonServlet is responsible for all interfacing with SCE,connection, disconnection, evaluation, and storing variables. It doesnot need to be extended, e.g., to add support for other SCEs. Rather,simply implement

SCEJsonInterface and inject it into the con-structor. It talks to DVP through an

SCEEvalMessage from messag-ing serialized in JSON. One possibility is to make an abstract parentclass and extend it to save any redundant code in several SCE commu-nication servlets implementing different $serializationType . DVTEventSourceServlet is responsible for registeringSSE connections with and sending SSE events to DVP. Itextends jetty implementation of SSE servlet (located at ./org/eclipse/jetty/servlets/EventSource*.java ).When an SSE request is sent, it is assigned a thread-safe ID andadded to a thread-safe list.

DVTEventSourceReplyServlet usesthis ID to check for replies to this speciﬁc request.

DVTEventSourceReplyServlet is responsible for acceptingreplies to SSE requests and doing blocking waits for the replies (ifrequired) using thread-safe lists and IDs supplied to it and obtainedfrom

DVTEventSourceServlet . Yet, another important to-do taskis to make sse message an sse json message. Next, for more clariﬁca-tion, we provide a simple scenario for SSE workﬂow.1) SSE initialization is requested by DVP; eclipse implementa-tion calls the function newEventSource , which is an ab-stract and implemented in this class, and the DVP is added to $eventSources list in

DVTEventSourceServlet .2) SSE request is sent through the function sendDataToDVTClient in DVTEventSourceServlet , therequest is assigned a static thread-safe ID that is created once aninstance of

SSEMessage from messaging is created.3) The SCE calls waitForSSEReply in DVTEventSourceReplyServlet , providing it with therequest ID returned from sendDataToDVTClient and the

DvpId . The servlets blocks with a timeout, checking for thethread-safe list for the request arrival.4) DVP sends a reply to the sse-reply url, and an instance of

SSEReplyMessage is created and added to the thread-safe list,which is checked in the waitForSSEReply . SCEJsonInterface , an interface with SCE, provides connec-tion, disconnection, evaluation and storage functionalities. It is re-quired for use in

DVTSCEJsonInterfaceServlet . Another to-do task is to make an abstract parent and extend it to handleother $serializationType . Classes/Interfaces needed to be im-plemented/extended to add functionalities for other SCEs.

ONCLUSION AND F UTURE W ORK

This article presented the design and implementation of DVP, a DataVisualization Platform, with an architecture that attempts to bridgethe gap between the Scientiﬁc Computing Environments (SCE) thatanalyze data and Data Visualization Software (DVSW) that visualizedata. DVP is designed to connect seamlessly to SCEs to bring thecomputational capabilities to the visualization methods in a single coherent platform. DVP is designed with two interfaces, the desktopstandalone version and the online interface. A free demo for theonline interface of DVP is available [1]. Although the architectureof DVP is ﬂexible to allow for integration with any SCE, the currentimplementation is only provided for Matlab. The future version ofDVP is an open-source version that integrates with Python to providewider support for the whole Python community, in general, and forthe “Data Science” community in particular. R EFERENCES [1] [Online]. Available: http://dvp.mesclabs.com/[2] C.-h. Chen, W. Härdle, and A. Unwin,

Handbook of data visual-ization . Berlin: Springer, 2008.[3] E. J. Wegman, “Visual Data mining,”

Stat Med , vol. 22,no. 9, pp. 1383–1397, 2003. [Online]. Available: https://doi.org/10.1002/sim.1502[4] ——, “The Grand Tour in k-Dimensions,” in

Computing Scienceand Statistics. Statistics of Many Parameters: Curves, Images,Spatial Models. Proc. 22nd Symposium on the Interface . Springer-Verlag, New York, 1992, pp. 127–136.[5] A. Inselberg, “Visualization & Data Mining for High DimensionalDatasets: tutorial,”

Unpublished Work , 2011.[6] A. Inselberg and T. Avidan, “Classiﬁcation and visualization forhigh-dimensional data,” Boston, Massachusetts, United States,2000. [Online]. Available: https://doi.org/http://doi.acm.org/10.1145/347090.347170[7] A. Inselberg, “Visualization and Data Mining of High-Dimensional data,”

Chemometrics and Intelligent Laboratory Sys-tems , vol. 60, no. 1-2, p. 147, 2002.[8] V. S. Cherkassky and F. Mulier,

Learning from data : concepts,theory, and methods . New York: Wiley, 1998.[9] K. Fukunaga,

Introduction to statistical pattern recognition ,2nd ed. Boston: Academic Press, 1990.[10] T. Hastie, R. Tibshirani, and J. H. Friedman,

The elements ofstatistical learning : data mining, inference, and prediction . NewYork: Springer, 2001.[11] C. M. Bishop,

Pattern recognition and machine learning . NewYork: Springer, 2006.[12] V. N. Vapnik,

The nature of statistical learning theory , 2nd ed.New York: Springer, 2000.[13] ——,

Statistical learning theory

TheGrammar of Graphics . Springer Science \& Business Media,2006.[17] H. Wickham,

Ggplot2 : elegant graphics for data analysis . NewYork: Springer, 2009.[18] A. Inselberg,

Parallel coordinates : visual multidimensional geom-etry and its applications : Data-DrivenDocuments.” IEEE transactions on visualization and computergraphics

Journal of the American Statistical Association , vol. 85,no. 411, pp. 664–675, 1990.[68] [Online]. Available: http://bl.ocks.org/mbostock[69] [Online]. Available: http://christopheviau.com/d3list/ A PPENDIX AB ACKGROUND , T

UTORIALS , AND M OTIVATION F ROM T EXT - BOOKS

This section is a very short account and tutorial on data visualizationand graphics, taken almost from textbooks, that includes history,importance and motivation for exploratory data analysis and Gram-mar of Graphics (GoG). It is important for a reader who is not fullyacquainted with the ﬁeld; however, it can be trimmed out from themanuscript without affecting its coherence.

A.1 History and Evolution of Graphics

Figure 10, [which appears in 2, as Figure 1.1] provides a graphicoverview of the evolution of data visualization presented as densityof major developments in the ﬁeld over time. The epoch of 1850–1900was named the “golden age” for the many innovations in graphicsand thematic cartography that took place for understanding data.The epoch of (1900–1950) was named “modern dark age” for thedecline in graphics and visualization development as a result of therise of quantiﬁcation and formal models and tendency to quantizeand formalize things. The epoch of 1950–1975 was named the “rebirthof data visualization” as a result of the great developments knownin the literature by Exploratory Data Analysis (EDA) that connectsvisualization to analysis and quantiﬁcation. The epoch of 1975–2000was named “high-D interactive and dynamic data visualization” forinvention of many new methods of visualization, interaction, newmethods of visualizing high dimensional data, etc.On the other hand, “

Computing advances have beneﬁted ex-ploratory graphics far more...The importance of software availabilityand popularity in determining what analyses are carried out andhow they are presented will be an interesting research topic for futurehistorians of science...In the world of statistics itself, the packages SASand SPSS were long dominant. In the last 15 years, ﬁrst S and S-plusand now R have emerged as important competitors. None of thesepackages currently provide effective interactive tools for exploratorygraphics, though they are all moving slowly in that direction as wellas extending the range and ﬂexibility of the presentation graphicsthey offer. ”[2]. We add Matlab and Mathematica, which are two veryimportant and powerful data analytic software, to this list.

A.2 Exploratory Data Analysis: importance and example on

This section conveys both the scientiﬁc need and the ﬁnancialopportunity for a sound and elaborate data visualization software.We borrow Figure 8 with little modiﬁcation from [18, Sec. 10.2.2].This dataset is part of hyper spectral satellite data for a portion ofSlovenia, in Europe. The map of that portion is on the right of Figure8. The dataset consists of 9 dimensions and 9,000 observations. Eachobservation represents a point on the map with 2 dimensions (named X and Y ) for its location; the other 7 dimensions (named B1 – B7 ) aredata collected from satellite measures for that particular point.The ﬁrst aspect of good data visualization is the ability to viewdata in dimensions higher than three! Figure 8 (ﬁrst row left) is aparallel-coordinate plot [18, 67] for this dataset produced by Parallax,the commercial software of the author of [18]. In ||-coords, axesare located parallel to each other as opposed to the perpendicularCartesian coordinate system. A point in ||-coords is represented asconnected line segments, which intersect with variable axes at thecorresponding feature values of that point. For example, on Figure 8(ﬁrst row right) the point on the map pointed to by the blue arrowcorresponds to the blue line on the ||-coords (ﬁrst row left).The second aspect of good data visualization is the ability of“interaction” with the available ﬁgures or plots and “linking” amongthese plots. “Interaction” is the ability to select parts of the data,using GUI actions, that may be of visual interest. Each group shouldbe colored differently with a transparency level (through an alphachannel) so that different patterns are distinguishable; this is calledbrushing. “Linking” is the ability to automatically select the same setof observations on other plots when those observations are selectedon one plot. For example, when the observation represented by theblue line on the ||-coords is selected the corresponding point shouldbe placed on the map with the same X and Y coordinates value on the ||-coords, and the other 7 features ( B1 – B7 ) correspond to its satellitemeasures.Examining the ||-coords plot of this data reveals a weired patternat the bottom of B4 . Selecting this pattern (as brushed in Figure8 (second row left)) surprisingly indicates that those observationsare corresponding to the lake of Slovenia (as brushed in Figure 8(second row right)). Thanks to “linking”. This is a wonderful shortcutto modeling and clustering this dataset. This visual inspection givesus the hypothesis that water in this part of the land can be detectedfrom satellite data by only thresholding the variable B4 .For more elaboration on ||-coords, Figure 13 [as appears in 18,Sec. 10.2.2] explains the geometry of a 2D line in both perpendicularcoordinates (the usual Cartesian system) and in ||-coords, which maynot be intuitive at all for new comers to ||-coords.A smart data analyst should study the data visually with manyplots and visualization methods than the ||-coords; e.g., matrix plot,histograms, projection pursuit, among other dozens of availablemethods; all should be linked to each other as mentioned above.A snapshot of few of these methods is illustrated in Figure 9 thatis borrowed from [66]. For a good reference of the literature ofdata visualization methods the reader may refer to [2]; and for acomprehensive interactive gallery and examples refer to [66, 68, 69].However, we provide this simple example only for illustrating theconcept. A.3 Grammar of Graphics (GoG)

A good example to explain the idea of GoG more is adopted from[16], from where Figures 11–12 are borrowed. We will not talk hereabout the semantics of the ﬁgure and the striking information revealedconcerning some countries (which is out of our current scope). We willfocus on the GoG that if exists, abstractly and generically enough, itwill produce such a ﬁgure and other more complicated ﬁgures veryefﬁciently.The design tree of the GoG of Figure 11 is drawn in Figure 11. Eachline-ending arrow depicts some relation between its two connectorssimilar to those adopted in relational databases. The correspondingpseudo code grammar that describes Figure 11 is this:

ELEMENT: point(position(birth*death), size(zero), label(country))ELEMENT: contour(position(smooth.density.kernel.epanechnikov.joint(birth*death)), color.hue())GUIDE : form.line(position((0,0),(30,30)), label("ZeroPopulation Growth"))GUIDE : axis(dim(1), label("Birth Rate"))GUIDE : axis(dim(2), label("Death Rate"))

Notice that the ﬁgure is full of information and many overlaid plots,including many colored contour plots, 2-D function (the straight line),and plot labels. However, its GoG descriptor is terse and efﬁcient.Moreover, and most importantly, it is ﬂexible and extensible, which isone of the most important features of DVP. DVP is designed to providea scripting language that follows the GoG of [16] to accomplish theextensibility feature discussed in Section 2.4.2