[PDF] A Testing Environment for Continuous Colormaps

Abstract

Many computer science disciplines (e.g., combinatorial optimization, natural language processing, and information retrieval) use standard or established test suites for evaluating algorithms. In visualization, similar approaches have been adopted in some areas (e.g., volume visualization), while user testimonies and empirical studies have been the dominant means of evaluation in most other areas, such as designing colormaps. In this paper, we propose to establish a test suite for evaluating the design of colormaps. With such a suite, the users can observe the effects when different continuous colormaps are applied to planar scalar fields that may exhibit various characteristic features, such as jumps, local extrema, ridge or valley lines, different distributions of scalar values, different gradients, different signal frequencies, different levels of noise, and so on. The suite also includes an expansible collection of real-world data sets including the most popular data for colormap testing in the visualization literature. The test suite has been integrated into a web-based application for creating continuous colormaps (this https URL), facilitating close inter-operation between design and evaluation processes. This new facility complements traditional evaluation methods such as user testimonies and empirical studies.

Full PDF

AA Testing Environment for Continuous Colormaps

P. Nardini, M. Chen,

Member, IEEE , R. Bujack, M. B ¨ottinger, and G. Scheuermann,

Member, IEEE

Fig. 1.

Left : Screenshot of our new interactive test suite implemented in the CCC-Tool.

Right : Two visualizations show at b the valleyshaped Six-Hump Camel Function [15] for the area [ − , ] × [ − , ] and at a the LittleBit test function (Sect. 4.2.2). Special featuresof the Six-Hump Camel function are irregularly shaped troughs in the center of the area that are more than two orders smaller thanthe data range difference between center and boundary. The uniformly spaced colormap cannot reveal the troughs, while the lastnonlinear colormap clearly shows the topological structure of the irregularities in the center of the domain. Abstract — Many computer science disciplines (e.g., combinatorial optimization, natural language processing, and information retrieval)use standard or established test suites for evaluating algorithms. In visualization, similar approaches have been adopted in some areas(e.g., volume visualization), while user testimonies and empirical studies have been the dominant means of evaluation in most otherareas, such as designing colormaps. In this paper, we propose to establish a test suite for evaluating the design of colormaps. Withsuch a suite, the users can observe the effects when different continuous colormaps are applied to planar scalar ﬁelds that may exhibitvarious characteristic features, such as jumps, local extrema, ridge or valley lines, different distributions of scalar values, differentgradients, different signal frequencies, different levels of noise, and so on. The suite also includes an expansible collection of real-worlddata sets including the most popular data for colormap testing in the visualization literature. The test suite has been integrated into aweb-based application for creating continuous colormaps (https://ccctool.com/), facilitating close inter-operation between design andevaluation processes. This new facility complements traditional evaluation methods such as user testimonies and empirical studies.

Index Terms —Testing Environment, Color Perception, Scalar Analysis

NTRODUCTION

In many ﬁelds in computer science, algorithms are commonly evaluatedusing open libraries of predeﬁned tests. A typical example is the ﬁeldof combinatorial optimization with ACM GECCO [1] as its leadingconference. New algorithms are often tested against standard prob-lems (e.g., the traveling salesman problem or the quadratic assignmentproblem), using established libraries (e.g., SPlib [35] or QAPlib [11]).These libraries contain test data sets with increasing difﬁculty, eachtypically focusing on different (known) challenges in combinatorialoptimization. Similarly, the natural language processing (NLP) commu-nity frequently uses test data sets in the Semantic Evaluation suite [63].In the past, algorithmic developments in NLP beneﬁted substantiallyfrom the TREC collection of test data [55].Algorithmic development has been a core component of scientiﬁcvisualization. There were earlier attempts to create an open collectionof test data, such as in the area of volume visualization, and no manycollections are still remaining (e.g., Klacansky:2020:web). Whilst there • Pascal Nardini is with Institute of Computer Science, University of Leipzig,Germany. E-mail: [email protected].• Min Chen is with Department of Engineering Science, University of Oxford,UK. E-mail: [email protected].• Roxana Bujack is with Data Science at Scale Team, Los Alamos NationalLaboratory, Los Alamos, NM, USA. E-mail:[email protected].• Michael B¨ottinger is with German Climate Computing Center (DKRZ),Hamburg, Germany. E-mail:[email protected].• Gerik Scheuermann is with Institute of Computer Science, University ofLeipzig, Germany. E-mail:[email protected] received xx xxx. 201x; accepted xx xxx. 201x. Date of Publicationxx xxx. 201x; date of current version xx xxx. 201x. For information onobtaining reprints of this article, please send e-mail to: [email protected] Object Identiﬁer: xx.xxxx/TVCG.201x.xxxxxxx are data sets from the SciVis contests, the evaluation of an algorithmor the comparison of different algorithms is usually conducted withproprietary data sets chosen by the evaluator (who is often also thealgorithm developer). In some cases, the evaluation or comparisonis accompanied by a usability study involving domain experts. Acolormap implicitly encodes an algorithmic transformation from a dataset to its visual representation. Hence the evaluation of a transformationis primarily about the colormap concerned. In this paper, we reporta test suite for evaluating continuous colormaps, in the spirit of theopen approach seen in other ﬁelds as well as in the earlier decadesof scientiﬁc visualization. This does not mean to abandon usabilitystudies, but the test suite can efﬁciently answer many questions and helpimprove colormap designs before or after a usability study. The testsuite will also allow the VIS community to collect test data representingdifferent challenges to colormap designs or different applications aswell as helping identifying guidelines to use or pitfalls to avoid.Color mapping is probably the mostcommonly used method fortransforming data variables into visual variables. Color heatmaps areubiquitous in all scientiﬁc applications featuring captured or simu-lated data with a spatial context (e.g., geographical data, imagery data,simulation results underpinned by some geometry, and so on). Everydomain expert dealing with 2D scalar ﬁelds knows how to observe colorheatmap. Color mapping is also an integral component in almost allcomplex rendering techniques for visualizing height-, scalar-, vector-,and tensor-ﬁelds. One would expect to ﬁnd some color mapping meth-ods in any free and commercial visualization software system [2,12,52].Selecting a good (continuous) colormap has been an enduring topicin visualization for many years (e.g., [23,42]). There are several metricsfor measuring the quality of colormaps (e.g., perceived color differ-ences [10, 44]) and tools for creating colormaps (e.g., [30]). However,there is yet a test suite for colormap designers to try out some designoptions in order to investigate how they may reveal patterns in different a r X i v : . [ c s . G R ] S e p est data sets. Although one may test a colormap using an existingapplication data set, this is usually not adequate enough because givena context, a color map should not only reveal important patterns inknown data sets, but also do the same for new data sets that may arrivein the future. A test suite can enable designers to speculate on different“what-if” scenarios, and examine how a colormap may react. This paperpresents a test suite designed to serve the above purpose. The test suiteconsists of three parts. The ﬁrst part contains a set of test functionsthat provide a comprehensive set of potential local properties of 2Dscalar functions. For example, some functions may feature “jumps”at different value levels and with different increments or decrements,gradients of different scales starting from different levels, critical points,ridges, and different frequencies. In Sect. 4, we give a mathematicalreasoning behind the selection of these functions.The second part of the test suite contains functions to model a setof global properties . The designer of a colormap often encounterschallenging data sets with complex variations of data ranges in differ-ent regions. For example, a multi-band colormap may suit a regionwith large but smooth jumps, while a slow-paced sequential colormapmay suit a region with many small but volatile jumps. Hence testfunctions that simulate such complex variations challenge colormapdesigners as well as colormap optimization algorithms. We use somewell-known sample functions from the (continuous) optimization litera-ture for this purpose. Complex variations can also be caused by somelocal properties alone, such as different signal-to-noise levels.In addition to the two categories of test functions, the third partprovides real-world data sets collected from different applications. Weexpect that this part of the test suite will continue to expend.By integrating the test suite in our open-access CCC-Tool, which en-ables users to create, edit and analyze colormaps [30], we now providea complete design and test environment for developing and testing ofapplication-speciﬁc colormaps and for supporting all types of visual-ization applications and domain-speciﬁc data sets. ELATED W ORK

Continuous color mapping (also heat mapping) refers to the associationof a color to a scalar value over a domain and can be considered themost popular visualization technique for two-dimensional scalar data.There are many heuristic rules for designing a good colormap, whichhave been applied mostly intuitively for hundreds of years [49, 50,66]. The most important ones are order, high discriminative power,uniformity, and smoothness [10].While some colormaps have been designed to sever as default col-ormaps for many data sets and can perform reasonably well in terms ofrule compliance [29], many colormaps are purposely-designed accord-ing to application-speciﬁc requirements such as the shape of the data,the audience, the display, or the visualization goal [3, 5, 36, 51]. Thenumber of possible colormap conﬁgurations and the body of relatedwork on this topic are huge [3, 36, 43, 57, 60].An effort has been made to measure the quality of colormaps withrespect to these rules quantitatively [10, 21, 29, 37, 54] or experimen-tally [19, 39, 60, 61], in order to develop theories and algorithms thatcan help automate the generation, evaluation, and improvement ofcolormaps. Although such theories and algorithms are usually gen-eral enough to be application-independent, the design of colormapsin many practical applications can only be effective if one includessome application-speciﬁc semantics in the design, such as key values,critical ranges, non-linearity, probability distribution of certain valuesor certain spatial relationships among values, and so on. Supportingsuch application-speciﬁc design effort is the goal of this work.

So far, there is no test suite for colormaps. However, the literaturehas provided various examples where some data sets were used forcomparing color maps and demonstrating color mapping algorithms.Sloan and Brown [51] suggest treating a colormap as a path througha color space and stress that the optimal choice of a colormap depends on the task, human perception, and display. They showcase their ﬁnd-ings with x -ray and satellite images. Wainer and Francolini [58] pointout the importance of order in a visual variable using statistical infor-mation on maps. Pizer [33, 34] stresses the importance of uniformity ina colormap, of which the curve of just noticeable differences (JNDs)is constant and in a natural order. The uniformity can be achieved byincreasing monotonically in brightness or each of the RGB components,such that the order of their intensities does not change throughout thecolormap. Tajima [54] uses colormaps with regular color differencesin a perceptually uniform colorspace to achieve perceptually uniformcolor mapping of satellite images. Levkowitz and Herman [21] suggestan algorithm for creating colormaps that produces maximal color dif-ferences while satisfying monotonicity in RGB, hue, saturation, andbrightness. They test them with medical data from CT scans. Bernardet al. [4] suggest deﬁnitions of colormap properties and build relationsto mathematical criteria for their assessment and map them to differ-ent tasks independent from data properties in the context of bivariatecolormaps. They test the criteria on analytical data that has differentshapes (e.g., different gradients and spherical surfaces).Pizer states that the qualitative task is more important in color map-ping applications, because quantitative tasks can be better performedusing contours or by explicitly displaying the value when the userhovers over a data point with the mouse. He uses medical images,including CT scans and digital subtraction ﬂuorography, as exampledata sets. Ware [60] also distinguishes qualitative and quantitative tasks.He agrees that the qualitative characteristics are more important andexplicitly mentions tables as a suitable means for the visualization ofquantitative values. On the one hand, he ﬁnds that monotonic change inluminance is important to see the overall form (qualitative) of his ana-lytic test data consisting of linear gradients, ridges, convexity, concavity,saddles, discontinuities, and cusps. On the other hand, his experimentsshow that when a colormap consists of only one completely monotonicpath in a single perceptual channel, the quantitative task is error-proneif one tries to read the exact data values based on the visualization.Rogowitz, et al. [3, 19, 38, 40–43] distinguish different tasks (isomor-phic, segmentation, and highlighting), data types, (nominal, ordinal,interval, and ratio), and spatial frequency (low, high), recommendingcolormap properties for each combination. They perform experimentson the visual perception of contrast in colormaps using Gaussian orGabor targets of varying strength superimposed on linear gradientsof common colormaps [19, 39]. Rogowitz et al. use a huge varietyof data through their extensive experiments, for example, text [40],MRI scans of the human head [42, 42], weather data showing clouds orozone distribution [42, 42], vector ﬁeld data from a simulation of theearth’s magnetic ﬁeld or jet ﬂows [3, 42], measurements from remotesensing [43], cartographic height data [42], analytic data covering abroad spectrum of frequencies such as planar wave patterns [43], lineargradients distorted by a Gaussian or Gabor target of increasing mag-nitude [19, 39], the luminance of a human face photograph [38], andso on. Their work demonstrates the diversity in the application ﬁeldof color mapping and how important it is for a colormap to encodeapplication-speciﬁc semantics. Zhang and Montag [65] evaluate thequality of colormaps designed in a uniform color space with a userstudy using a CAT scan and scientiﬁc measurements such as remotesensing and topographic height data. Gresh [16] measures the JNDbetween colors in a colormap, using cartographic height data. Ware etal. [61] generate stimuli for experiments on colormap uniformity bysuperimposing vertical strips of Gabor ﬁlters of different spatial extentover popular colormaps with magnitudes ranging from nonexistenceon the top to very strong contrast on the bottom. The users’ task is topick the location where they could ﬁrst perceive the distortion.Light and Bartlein [22] warn of using the rainbow colormap, showingthat it is highly confusing for color vision impaired users at the exampleof temperature data covering North and South America. Borland [6]also criticizes the rainbow colormap for its lack of order. He comparesdifferent colormaps based on analytic test data that features a spec-trum of changing frequencies, different surface shapes, and gradients.Kindlmann et al. [20] suggest a method to evaluate users’ perceptionof luminance using a photograph of a human face. Schulze-Wollgast et able 1. The most popular test data for colormap testing in the visualiza-tion literature. analytic data [6, 19, 27, 29, 39, 43, 56, 59, 60][4, 26, 61]statistics and maps [8, 9, 13, 14, 27, 47, 51, 57, 64]medical imaging [21, 33, 34, 42, 51, 59, 65]scientiﬁc measurements [14, 16, 22, 29, 30, 42, 43, 46, 64, 65]scientiﬁc simulations [3, 5, 29, 30, 42, 45, 46, 56]photographs [20, 38, 51, 54]al. [47] focus on the task of comparing data using statistical informa-tion on maps. Tominski et al. [57] also stress that the characteristicsof the data, tasks, goals, user, and output device need to be takeninto account. They introduce their task-color-cube, which gives rec-ommendations for the different cases. They use cartographic data todemonstrate their ﬁndings. Wang [59] chooses color for illustrativevisualizations using medical data and measurements of transmissionelectron microscopy (TEM), analytic jumps, and mixing of rectangles.Zeileis et al. [64] provide code to generate color palettes in the cylindri-cal coordinates of CIEUV and showcase results using geyser eruptiondata of Old Faithful and cartographic data. Moreland [29] presentsan algorithm that generates diverging colormaps that have a long paththrough CIELAB without sudden non-smooth bends. His red-bluediverging colormap is the current default in ParaView [2]. He testsdifferent colormaps with data representing a spectrum of frequenciesand gradients partly distorted by noise. He also stresses the importanceof testing on 3D surfaces where shading and color mapping compete,e.g., the density on the surface of objects in ﬂow simulation data or on3D renderings of cartographic height data. Borland [5] collaborateswith an application scientist working on urban airﬂow. They suggestcombining existing colormaps to design domain-speciﬁc ones, and incase of doubt stick with the black-body radiation map. They sacriﬁcetraditional rules (e.g., order) to satisfy the needs (huge discriminativepower) of the application. Eisemann et al. [13] separate the adaptionof the histogram of the data from the color mapping task, introducingan interactive pre-colormapping transformation for statistical informa-tion on maps. Thompson et al. [56] suggest applying special colorsoutside the usual gradient of the colormap to dominantly-occurringvalues, which are “prominent” values occurring with high frequency.Their test data includes the analytic Mandelbrot fractal and ﬂow sim-ulation results, which are partly provided as examples in ParaView.Brewer [8, 9] provides an online tool to choose carefully designed dis-crete colormaps. This is perhaps the most widely used tool for discretecolormaps. Mittelst¨adt et al. [26, 27] present a tool that helps to ﬁnd asuitable colormap for different task combinations. They showcase theirﬁndings with analytical data, like gradients and jumps, and real-worldmaps. Samsel et al. [45, 46] provide intuitive colormaps designed by anartist to visualize ocean simulations and scientiﬁc measurements in theenvironmental sciences. Fang et al. [14] present an optimization tool forcategorical colormaps, and use the tool to improve the colormap of theLondon underground map and that for seismological data visualization.Nardini et al. [30] provides an online tool, the CCC-Tool , for creating,editing, and analyzing continuous colormaps, demonstrating its useswith captured hurricane data, simulated ocean temperature data, andresults of simulating ancient water formation.All in all, we found that the most popular way of evaluating thequality of colormaps in the literature is the use of speciﬁcally designedanalytic data like gradients, ridges, different surface shapes, fractals,jumps, or different frequencies, because these synthetic data sets help toidentify speciﬁc properties of the colormaps. The second most commonuse is cartographic maps, which reﬂects the historical use of color map-ping. Furthermore, it is also common to use data in typical applicationsof scientiﬁc visualization as test data, e.g., ﬂuid simulations (wind,ocean, turbulence), scientiﬁc measurements (weather, clouds, chemi-cal concentration, temperature, elevation data), and medical imaging(x-ray, CT scan, digital subtraction ﬂuorography, transmission electronmicroscopy). A summary can be found in Table 1. We have carefullydesigned our colormap test suite according to these ﬁndings, not onlyproviding an extensive selection of expressive analytic data, but also containing real-world data from different scientiﬁc applications.

OTIVATION

In several ﬁelds of computer science, the use of established test suitesfor evaluating techniques is standard or commonplace. The motivationfor this paper is to introduce such a test suite to scientiﬁc visualization.So far, user testimonies and empirical studies have been the dominantmeans of evaluation in the literature. With this work, we would liketo initiate the development of an open resource that the communitycan use to conduct extensive and rigorous tests on various colormapdesigns. We also anticipate that the community will contribute newtests to this resource continuously, including but not limited to testsfor colormaps used in vector- or tensor-ﬁeld visualization. Such a testsuite can also provide user-centered evaluation methods with stimuliand case studies, while stimulating new hypotheses to be investigatedusing perceptual and cognitive experiments.The development of testing functions in this paper deals with thecommon features that pose challenges in scalar analysis, such as jumps,local extrema, ridge or valley lines, different distributions of scalarvalues, different gradients, different signal frequencies, different levelsof noise, and so on. This scope should be extended in future workprogressively with more and more complex or specialized cases.The main design goal of our test suite is to provide a set of intuitivefunctions, each of which deals with one particular challenge at a time.They should be easy to interpret and to customize by experts as wellas non-expert users. This aspired simplicity in design can be exploitedin future work to facilitate automatic production of test reports orautomatic optimization of colormaps with respect to a selection of tests.At present, this initial development should provides a set of testfunctions simulating a variety of planar scalar ﬁelds with differentcharacteristic features. It should enable the users to observe the effectswhen different continuous colormaps are applied to scalar ﬁelds thathave the characteristic features similar to those featured in an applica-tion. In many situations, the users may anticipate certain features indata sets that are yet to arrive, and would like to ensure that the colormapping can reveal such features effectively when the data arrives.Finding and deﬁning a suitable testing function is usually easier thanmanually creating a data set. Especially, unlike a synthetic data set, atest function is normally resolution-independent and is accompaniedby some parameters for customization.In addition, the test suite should provide users with data sets thatcome from real-world applications, possibly with some modiﬁcationwherever appropriate. Such an application-focused collection can becompiled from the most popular data for colormap testing in the vi-sualization literature. Since both the collection of test functions andthat of real-world data sets are extensible, the ﬁeld of visualization maysoon see a powerful test suite for evaluating colormap design. This isdesirably in line with other computer science disciplines.

EST S UITE

The ﬁrst design goal of our test functions is to allow intuitive interpre-tation of colormap properties by users. This requires each test functionto have an easily-understandable behavior, and to have a clear mathe-matical description that can be reproduced consistently across differentimplementation. The second design goal is to build the collection of testfunctions on the existing analytic examples in the literature surveyed inSect. 2 to ensure that the existing experiments can be repeated and com-pared with new experiments. The third design goal is to help users toﬁnd the test suite and to conduct tests easily. Hence we integrate the testsuite with the

CCC-Tool , allowing users to conduct tests immediatelyafter creating or editing a colormap speciﬁcation.As mentioned in Sect. 1, our test suite has three parts: local tests,global tests, and a set of application data sets. The local tests aremostly based on analytic examples in the literature and are deﬁned withconsiderations from calculus to cover most local properties of scalarfunctions. The global tests feature analytic properties of scalar functionsthat are not local, such as signal to noise ratio, global topologicalproperties, different levels of variation, etc. Finally, the application- ig. 2.

Left:

The table shows the structure of the neighborhood withfour elements ( A = { a , a , a , a } ). The odd indexed columns (yellow)always include the same value, increasing from the ﬁrst to the lastcolumn. The even indexed columns (orange) contain the whole set oftest values in increasing order. Right:

Neighborhood variation test with A = { . , . , . , . } for the colormap displayed below including athree-dimensional version encoding the values through height.Fig. 3. Three gradient tests, with r = , R = . . Top Row : Color mappingvisualizations of the

Gradient Variation function for the types linear , convex , and concave with T x = T y and b = for the ﬁrst type and b = forthe other types. Bottom Row : 3D height-map visualizations of the threegradient tests. speciﬁc data sets reﬂect the well-documented fact that colormaps shouldalso be evaluated using real-world data sets.The mathematical notions in this section use a few common parame-ters. The user-deﬁned parameters, r , R , set the test function range, with r , R ∈ R ∧ r (cid:54) = R . R and r determine the minimum m and maximum M of the test function with m < M ∈ R . With b ∈ N the user can selectan exponent that describes the polynomial order. For functions withenumerated cases, the user can select a speciﬁc option T . Our basic design principle behind the local tests is classical calculus.Local means that these test functions help to check the appearance oflocal properties of a scalar function after mapping it to color with theselected color map. The main idea is to use typical local approxima-tions like low order Taylor series expansions to create the test functions.We use step functions to show the effect of discontinuities, and pro-vide functions with different gradients, various local extrema, saddles,ridges, and valley lines. This corresponds to ideas in the literature, asshown in Sect. 2, e.g., works by Mittelst¨adt [26, 27] or Ware [60]. Wealso use elements of Fourier calculus by providing functions to test theeffect of different frequencies. The ﬁnal test looks at the colormap’spotential to visually reveal small differences within the data range,which might be an important colormap design goal.

Some popular test images in the literature use steps between adjacentpixels [26, 27, 59]. In terms of calculus, this means to use a functionwith discontinuities. Ideally, the function should have different stepheights starting from different levels. For this purpose, we deﬁne aset A = a ,..., a n − of increasing test values a i < a i + . The functionis split into a rectangular grid with constant values. Uneven columnscontain in ascending sequence one value of A , while even ones contain A in increasing order. This means that some steps appear multiple times, Fig. 4. This ﬁgure shows a 2D color mapping representation (top) anda 3D height-map (bottom) of the 2d scalar ﬁelds created with the testfunction yielding a minimum with o = and p = (left), a maximum with o = − and p = − (middle), and a saddle with o = − and p = (right).Fig. 5. Three ridge/valley-line tests (columns), with r = , R = . . Theridge/valley-line is always centrally at x = . Top Row : Color mappingwith T x = T y = linear , T x = T y = concave , and T x = T y = convex , and b = inthe latter two cases. Bottom Row : 3D height-map versions of the sametests. but the function is rather simple to remember; see Fig. 2. Formally, thefunction f Step : [ , n ) × [ , n ) → R is given by f Step ( x , y ) = (cid:40) a (cid:98) x (cid:99) / if (cid:98) x (cid:99) mod 2 = a (cid:98) y (cid:99) otherwise (1)where (cid:98) x (cid:99) denotes the largest integer smaller or equal to x . An observercan quickly identify the column, including one concrete test-value, andcompare it with all values using the adjacent column. In the literature, the most common analytic example features gradientvariation [4, 6, 26, 27, 29]. In terms of calculus, at ﬁrst glance, this maysuggest using a ﬁrst-order Taylor approximation. However, the workby Ware [60] indicates that one should also test concave or convexproperties, which means to use polynomials of somewhat higher-order,e.g., quadratic polynomials.For our

Gradient Variation test, we deﬁned a test function f Grad ( x , y ) : [ , ] × [ , ] → R to examine different gradients includinga concave or convex pattern. The two options T x , T y determine if thebehavior is convex or concave along the x -axis or the y -axis.Basically, along each horizontal line, we start at r and interpolate tosome value g ( y ) ∈ R . The function g starts with g ( ) = r and ends with g ( ) = R . It may also use a linear, convex, or concave interpolationalong y . Its deﬁnition is g ( y ) = (cid:40) ( R − r )( − ( − y ) b ) + r , if T y = concave ( R − r ) y b + r , if T y = convex (2)If b equals one, then both cases describe a linear gradient along y , whichwe also denote as T y = linear . For b ≥

2, we get a concave shape witha decreasing gradient in the case T y = concave and a convex shape withincreasing gradient for T y = convex . ig. 6. These two pictures show an example for the FrequencyVariation test. Left: 2D color mapping of the test ﬁeld. Right: 3Dheight-map visualization.Fig. 7. This ﬁgure shows an example test image for each of the three

Threshold Variation test types. From left to right you can see thelinear, the ﬂat, and the steep type. The top row shows the color mappingimages and in the bottom row the corresponding test images as height-maps.

The actual test function is now deﬁned in a similar manner by f Grad ( x , y ) = (cid:40) ( g ( y ) − r )( − ( − x ) b ) + r , if T x = concave ( g ( y ) − r ) x b + r , if T x = convex (3)Again, b = x in both cases, and wecan write T x = linear in this case. T x = concave means a convexshape created by a decreasing gradient, while T x = concave results inan increasing gradient and a concave shape. With R > r , we get anincreasing function and with R < r a decreasing one. Fig. 3 shows 2Dand 3D visualizations of examples for the function f Grad ( x , y ) for thethree cases: T y = linear , concave , and convex . Besides discontinuities and gradients, local extrema and saddles are thenext types of structures in scalar ﬁelds, according to calculus. This hasalso been noted by Ware [60] and is the fundamental insight behindtopological visualization methods [17]. As local extrema (of somesigniﬁcance or persistence) and saddles are of interest for most scalarﬁeld analysis tasks, we thus include them in our test suite. For thecalculation of user-deﬁnable minima, maxima, and saddles, we areusing Equation 4. f MMS ( x , y ) = ox + py + m (4)It is based on the well-known fact that stable critical points can bedescribed by quadratic functions. We put the extrema or saddle at zero.The user can create a maximum with o < ∧ p <

0, a minimum with o > ∧ p >

0, and a saddle with o > ∧ p < ∨ o < ∧ p >

0. Thestarting value of the structure is given by m ∈ R . Fig. 4 shows anexample of this test function with visualizations of minima, maxima,and saddle points. Fig. 8. This ﬁgure shows four color mappings of global topologicalstructures with a gradient test function as foundation. The according 3Dversions are shown at the bottom. The image A shows the pure noiseusing the replacement option. The following images show the options max − scaled ( B ), min − scaled ( C ) and range − scaled ( D ) with activatedclipping and a limited value adjustment of 0.25. While B focused thenoise to higher foundation values, C does the same with lower ones. Incontrast D allows globally adjustments of the foundation.Fig. 9. The two pictures show the Little Bit Variation test ( m = . , M = , g m = . , g M = . ), which can used to test the impressionof small value variations by the visibility of vertical grooves. The leftimage shows that the ﬁrst groove with an amplitude of 0.0001 is notperceptible for all values between 0.1 and 1.0. The right image shows a3D height-map visualization of the same test. Besides local extrema and saddle points, ridge and valley lines arefurther relevant topological shape descriptors. Again, this has beennoted by Ware [60] with respect to color mapping. Also, the relevanceof ridges and valley lines is well established in feature-based ﬂowvisualization [17]. To test the suitability of colormaps for scalar ﬁeldsthat include such lines, we use a function f RV : [ − , ] × [ , ] → R .The location of the ridge/valley-line is always at x = g that we introducedin Sect. 4.1.2, so it may be linear, convex, or concave according tothe exponent b ∈ N and the shape descriptor T y . For the slope in x -direction, we basically use the absolute value with exponent b , i.e., | x | b on the interval [ − , ] . This creates a concave shape. For convexshapes, we use the similar function 1 − ( − | x | ) b . Both functions areadjusted to interpolate between r at − g ( y ) at 0. This isquite similar to the deﬁnition of g . We introduce the type parameter T x and set it to ”convex” or ”concave” and arrive at the deﬁnition: f RV ( x , y ) = (cid:40) ( r − g ( y )) | x | b + g ( y ) if T x = concave ( r − g ( y ))( − ( − | x | ) b ) + g ( y ) if T x = convex (5)As in the gradient variation case, b = R > r ,we get a ridge-line. For R < r , we get a valley-line. Fig. 5 shows anexample test for ridge-lines using the different T x and T y types. Another common test for analyzing colormaps involves variations ofspatial frequencies [6, 29, 43]. To check the behavior for wave-likefunctions, the

Frequency Variation Test uses an increasing fre-quency in the x -direction and a decreasing amplitude in the y -direction. ig. 10. These images show the different distributions for the signalnoise option. To illustrate the noise, the pictures were created withthe replacement option and a noise proportion of 100%. You can seethe following distributions: =uniform, =normal, =beta, =beta-left, =beta-right. Below each picture is a histogram.Fig. 11. This ﬁgure shows some examples for test functions used in theoptimization computer science and also is of interest for our test functioncollection. From left to right you can see the functions: A :Bukin FunctionNumber 6; B :Langermann Function; C :Cross-in-Tray Function; D :LevyFunction Number 13; E :Schwefel Function. [18, 28] In x -direction, we use an additional parameter D deﬁning the numberof frequency increases. Basically, we start with a single sine wave withfrequency 1. This takes one unit of space in x -direction. Then we add asingle sine wave of frequency 2 using a half unit of space. Then, weadd a single sine wave of frequency 3 using one third unit of space. Wecontinue in this way until we have D + x -range will obviously depend on the elements in the, so we denotetheir elements as x = x j = j ∑ k = k (6)The wave will swing around the median value u with an amplitude W .Now, we can deﬁne the test function by f Freq : [ x , x D + ] × [ , ] → R (7) f Freq ( x , y ) = W ( − y ) sin ( π j ( x − x j )) + u , x j − ≤ x ≤ x j j = ,..., D + Frequency Variation test with sixdifferent frequencies. It should be noted that the image resolution iscritical with regard to aliasing effects in case of high frequencies, i.e.,when D is large. In many scientiﬁc disciplines, natural thresholds, such as the freezingpoint of water at 0 ◦ C, and the data distribution close to them are of sig-niﬁcant importance for visual analysis. This is also a major reason forthe high relevance of isolines and isosurfaces in scientiﬁc visualization.It is possible to integrate isolines into a colormap by creating a dis-continuous transition point in the colormap [30]. With the

ThresholdVariation test, we created a function for testing a speciﬁc user-deﬁnedthreshold t . The test function f T spans the domain [ − , ] × [ − , ] .The isoline with the value t is a vertical line in the middle, i.e., x = m , and M , with m < t < M . To increasevisual information in the test image, the functions f M , f m : [ − , ] → R change the minimum and maximum value linearly depending on y : f M ( y ) = M + t − M − t y (8) f m ( y ) = t + m + t − m y (9) Fig. 12. This ﬁgure shows different real-world data sets we collected forour test suite. You can see the following data sets: A :A thermal dataset for algorithm training (source: ); B :Simulation data of asteroid impacts in deepocean water (source: https://sciviscontest2018.org/ ); C :FTLEtechnique from the ﬁeld of ﬂow visualization [48]; D :Medical computertomography scan of a head (source: https://graphics.stanford.edu/data/voldata/ ).Fig. 13. This ﬁgure shows a screenshot of the test evaluation inthe CCC-Tool. We present statistics of the Value Difference Field , Color Difference Field , and

Subtraction Field . On the right sideare two visualizations of the test-function. One color mapping isdone with a grey-scaled colormap and the other one with the se-lected colormap for the analysis. Below are three color mapping im-ages of the

Value Difference Field , Color Difference Field , and

Subtraction Field . All ﬁve images are zoomable and change interac-tively. As a sixth part, the pixel observer shows in combination with thetable the pixel-neighborhood-information of the three ﬁelds.

We can now deﬁne our test function f T : [ − , ] × [ − , ] → R . Alongeach horizontal line, i.e., for ﬁxed y , the value changes from f m ( y ) at − t at 0, and ﬁnally reaches f M ( y ) at 1. We use three cases thatwe call T = linear , T = f lat , and T = steep . The type describes thegradient behavior around the threshold. As in the cases for the gradientvariation and ridge/valley lines, we use an exponent b ∈ N as ﬁnalparameter. b = f T ( x , y ) =  ( f m ( y ) − t ) | x | b + t if ( T = f lat ∧ x ≤ )( f M ( y ) − t ) | x | b + t if ( T = f lat ∧ x > )( f m ( y ) − t )( − ( − | x | ) b ) + t if ( T = steep ∧ x ≤ )( f M ( y ) − t )( − ( − | x | ) b ) + t if ( T = steep ∧ x > ) (10)In Fig. 7, a plot shows examples of all three types. In contrast to the local tests, the global tests look at more global proper-ties of scalar functions and how well the colormap presents them. First,we look at global topological properties. We use functions showingPerlin noise to create multiple local minima and maxima on differentheight levels and different spatial structures. Details can be found inSect. 4.2.1. Second, it is a challenge for any colormap to deal with alarge span of the overall values while small, but relevant, local valuevariations are also present. Any nearly linear colormap will completelyverlook these so-called little bit variations. As test functions, weuse linear functions with varying gradient and height as backgroundwith small grooves to include little bit variations. The deﬁnition isgiven in Sect. 4.2.2. Third, real-world data, especially images createdby measurements, contain noise of various types and intensity, i.e.,signal-to-noise ratio. We use functions from the local test suite and adduniform or Gaussian distributed noise of different signal-to-noise ratios.We describe the details in Sect. 4.2.3. Finally, we add a collection oftest functions from other computer science disciplines to allow for testsusing these functions.

As noted above, other authors indicated the relevance of critical pointsfor testing colormaps before. In contrast to the local topology inSect. 4.1.3, we use a larger number of critical points in the followingtest. For the creation of global topological structures, we take the 2Dversion of the improved noise algorithm introduced by Perlin [31, 32],which is often used for the creation of procedural textures or for terraingeneration in computer games. The idea of this test function is to usesome other test function from Sect. 4.1.2 to Sect. 4.2.2 as a backgroundand combine this ﬁeld with noise according to Perlin’s work. Thesedistorted gradients and shapes are in analogy with colormap testingfunctions speciﬁcally used to determine the discriminative power ofsubregions of colormaps [19, 29, 39, 61, 62].To create the critical points, we use the noise function f Noise ( x , y ) ∈ [ − n , n ] , n > ∧ n ≤ min − , max − , and range − scaled the selected test-function f test affect the result. Whereby the inﬂuence of the ﬁrst two optionsdepend on the closeness of the local value f Noise ( x , y ) to the m , orrather M . This procedure creates noise that is focused on small/highvalues. At the range − scaled option, the adjustment of the local valueis limited by the test-function range from m to M . Furthermore, anoptional clipping method for these three options prevent values out of [ m , M ] . Fourth, we offer replacement as a ﬁnal option, where users canset a custom noise-range N = [ n m , n M ] with f Noise ( x , y ) ∈ N . With thisoption, the entries of the test-function will be replaced by the noisevalue. f test ( x , y ) =  f test ( x , y ) + f Noise ( x , y ) ∗ f test ( x , y ) − mM − m if max − scaledf test ( x , y ) + f Noise ( x , y ) ∗ M − f test ( x , y ) M − m if min − scaledf test ( x , y ) + f Noise ( x , y ) ∗ ( M − m ) if range − scaledf Noise ( x , y ) if replacement (11) The teaser Fig. 1 demonstrates that standard colormaps may easilylead to overlooked small value variations. For such cases, i.e., if smallvariations in the scalar ﬁeld (within a small sub-range of the full datarange) carry valuable information for interpretation, we deﬁne a test onthe potential of a given colormap to visually resolve small perturbations.This is similar to distorted gradients, which appear quite frequently inthe literature [19, 29, 39, 61, 62]. The

Little Bit Variation test f LB : [ , n + ] × [ , ] → R (12)uses a background function and adds a function f G producing n smallgrooves. The background function in this test is a linear gradientalong the y-direction, which is deﬁned by a user-speciﬁed value range [ m , M ] . Along the x -direction, this function is modiﬁed by a function f G creating 2 n + f LB ( x , y ) = m + ( M − m ) y − f G ( x ) (13)The function f G produces sine-shaped grooves for odd (cid:98) x (cid:99) and nochanges for even (cid:98) x (cid:99) . As x runs from 0 to 2 n +

1, this creates exactly n grooves. f G ( x ) = (cid:40) , if (cid:98) x (cid:99) mod 2 = − f A ( x ) sin ( π ( x − (cid:98) x (cid:99) )) , otherwise (14)As can be seen, the sine wave’s amplitude is changed by a function f A ( x ) and create a test of different small value changes (groove depths).The function f A ( x ) determine for each groove the depth by linearinterpolation between user-deﬁned minimum g m and maximum g M . InFig. 9, you can see an example for the Little Bit Variations test. f A ( x ) = g m + (cid:98) x (cid:99) − ∗ n − ( g M − g m ) (15) In the signal and data processing, noise plays an important role. It alsoaffects the results of scientiﬁc visualizations. Like the global topologytest (see Sect. 4.2.1), our tool offers to add noise to each test function(Sect. 4.1.2 - Sect. 4.2.2). The tool uses the standard random algorithmfrom JavaScript, which produces pseud- random numbers in the range [ , ] with uniform distribution. For the noise behavior, we offer thesame noise behavior options from Sect. 4.2.1. Independent from theselected option, the fraction of noisy pixels can be set. This fractiondescribes how many randomly selected ﬁeld values are affected bynoise. If the noise proportion is set to 100%, the full test-functionis affected by noise. For more ﬂexibility, we also offer a conversionfrom a uniform distribution to a normal or a beta distribution. Theconversion from uniform to the normal distribution is done with theBox-Muller transform [7]. With the normal distribution, the noise willbe more focused on weaker changes around null for the min / max − scaled and range − scaled options. For the replacment option, thenormal distribution causes a focus on values around the median ofthe deﬁned range of noise values. The approach from uniform to abeta-like distribution (with al pha , beta = .

5) is done with the equation beta

Random = sin ( r ∗ π ) , with r being the result of the standard randomgenerator. Adding noise using a beta distribution with the min / max − scaled or range − scaled options will have a priority for values nearthe maximal change parameter m and − m . For the replacement option,values near the minimum and maximum of the deﬁned noise valuerange will be preferred. We modiﬁed this conversion with a view todo this preference on only one side, thus for m or − m in the ﬁrst caseor for the maximum or minimum in the other case. The modiﬁcationis a mirror at the median random value to the left or right side of thismedian. This allows us to create a beta-like distribution and also aleft-oriented beta distribution and a right-oriented beta distribution.Fig. 10 shows the different distribution options. Many domains of computer science use test functions for the evalua-tion of algorithms. There are several widespread well-known functionslike

Mandelbrot Set or Marschner Lobb and also functions like the

Six-Hump Camel Function from the teaser, which are better knownin optimization than computer science [18, 24, 25]. Such functionsand their different attributes could also be an enrichment for evalua-tion in scientiﬁc visualization. Therefore, we included a collection ofsuch functions from the literature in our testing environment. Thesefunctions stand beside our development of test functions, and providefurther challenges for colormaps. With this collection, we want toprovide over time more and more such functions of interest. In orderto allow users to test their colormaps without changes, we allow theuser to scale the values of these functions to the range of the colormapor a user-deﬁned range. Fig. 11 shows some examples of functionsused for optimization. Obviously, they also have relevant properties forthe evaluation of color mapping. For example, the

Bukin Function includes many small local minima along a valley-line. [18] ig. 14. This ﬁgure shows application of the

T hreshold test (Sect. 4.1.6)to improve the distinguishability of temperature variations around theSouth Pole with focus on the freezing point (see close-ups in the lowerright corner of a ). : Local uniform optimized cool-warm colormap. :Modiﬁed colormap with a discontinuous transition point to improve thethreshold. a : Visualization of the 2m-temperature of a high resolutionsimulation with the global atmosphere model ICON. b : Threshold testfunction with m = − , M = , and t = . c : Subtraction Field of theevaluation method (Sect. 5).

In the two previous sections, we described several analytic test func-tions concerning speciﬁc challenges encountered in color mapping.Additionally, we also introduced a collection of already existing testfunctions from other computer science domains. Nevertheless, wethink that the involvement of real-world data is indispensable for thecompletion of this test suite. Real-world data originates from manydifferent sources, is generated with various measurement techniquesor simulation algorithms, and includes a myriad of attribute variations.Most importantly, such data could potentially present several of thechallenges described in the two previous sections at the same time. Thiskind of test cannot be easily replaced by our theory-based test func-tions completely. Therefore, we decided to include a set of applicationtest data from different domains to cover a wide spectrum of realisticchallenges.Within one speciﬁc scientiﬁc domain, there is often a similarity be-tween typical data sets; e.g., in medicine, data from the MRI (MagneticResonance Imaging) or the CT (Computer Tomography) is frequentlyused. Such data sets have similar attributes, and similar requirementshave to be fulﬁlled by colormaps. If we cover different typical datasets of different scientiﬁc disciplines, in the future, we can hopefullyoffer enough different real-world test cases so that most users will ﬁnda case that has some similarities with his data. Like the test functioncollection from Sect. 4.2.4, this collection of real-world data will beextended over time. At the current version, the tool offers medical-,ﬂow-, and photograph-speciﬁc real-world data.

EST E VALUATION

Mostly there are good reasons to select speciﬁc colormaps or to designcolormaps in a speciﬁc way. Depending on the actually envisagedpurpose of the colormap, a user decides on the number of keys; thehue, saturation and value of each key; the gradients in the mappingbetween the data range and the colormap; and so on. Furthermore, defacto standards and cognitive motivation may also inﬂuence the user’schoice. Therefore, meaningful automated evaluation of continuouscolormaps without knowledge of their intended use is rarely feasible.Therefore, a general colormap score computed based on automatic testsand benchmarks might not be informative.Instead, we propose to derive information based on aforementionedtest functions that can be analyzed and rated by users themselves. Auser ﬁrst chooses a test-function from Sect. 4. For each grid point of thegenerated test ﬁeld, we calculate the value differences to the neighbor-

Fig. 15. : Starting with the modiﬁed colormap of Fig. 14, we usedthe Little Bit test (Sect. 4.2.2) to increase the number of noticeablevalues of positives temperatures. The view in the a -panels is centered onAfrica. : Modiﬁed colormap which uses more hue variations to improvethe Little Bit results. a : Visualization of the m -temperature of a highresolution simulation with the global atmosphere model ICON. b : LittleBit test function with m = ◦ C, M = ◦ C, g m = . ◦ C, and g M = . ◦ C. c : Color Difference Field of the evaluation method (Sect. 5). ing grid points. Depending on the location within the ﬁeld, the numberof neighbors varies between three and eight. We normalize these valuedifferences with the minimum and maximum value differences foundand save them into a

Value Difference Field . We repeat this pro-cess also for the colors. Here, we use some color difference norm (Lab,DIN99, DE94, or CIEDE2000) and save the normalized values into the

Color Difference Field .By subtracting these two ﬁelds from another, we get a

SubtractionField . This ﬁeld represents the local uniformity of the color mapping;when the local gradients found in the data are accordingly representedin the color mapped ﬁeld, the difference between normalized data ﬁeldand normalized color mapped ﬁeld is zero for all pixels/locations. Inthe case of a non-linear color mapping, in contrast, the

SubtractionField will particularly highlight areas with strong non-linear mapping,which the user might have designed intentionally in order to increase thenumber of discriminable colors for a part of the data range. The user canstudy the

Color Difference Field as well as the

SubtractionField to analyze the color mapping of the test function.Each of the three ﬁelds has three up to eight values for each pixel.For the color mapping (Fig. 13), the user can select maximum, average,or median. Next to that, there are options to select a method for thecalculation of the color difference. The tool offers Euclidean distancefor Lab and DIN99 space or the use of the DE94 or CIEDE2000 metricsin the Lab space.To compare the visualizations of

Color Difference Field ofdifferent colormaps, we cannot use the normalization by minimum andmaximum. The colors of such color mappings would relate to differentcolor difference values and are not comparable. Therefore we imple-mented two alternative options using ﬁxed values for the minimumand maximum of the normalization to create comparable results. The

Black-White normalization use the greatest possible color differencebetween black and white as maximum and zero as minimum. The

Custom normalization uses a user-entered maximum, which is a neces-sity if the black-white difference is to big by contrast with the occurringcolor differences of the

Color Difference Fields . In Fig. 14, weused this third option to get a comparable visualization for a colormapwith a discontinuous transition point.

PPLICATION C ASE

In this section, we show how the test suite could be utilized to evaluatethe suitability of colormaps with respect to a given application problem.For this example, we chose a data set from a simulation with a high-resolution global atmosphere model. The data we use is one timestepf the temperature at a height of 2m simulated with the icosahedralICON model at a global resolution of 5km [53]. We remapped the datafrom the unstructured model grid to a regular grid with 4000 × − ◦ C and more than50 ◦ C. For the selected time step, the simulated 2m-temperature variesbetween about − ◦ C and 52 ◦ C. Regionally, however, small tempera-ture variations of the order of 0 . ◦ C might be critical for the analysisas, e.g., in the neighborhood of the freezing point at 0 ◦ C.Panel of Fig. 14 shows a visualization of the data using a sphericalprojection with a focus on the South Pole. In contrast to mountainousregions, where the horizontal 2 m -temperature gradient is generallyhigh, the gradient in ﬂat areas such as oceanic regions is much smaller.Here, the color differences are too small to depict local temperaturevariations, as for example, in regions with values close to 0 ◦ C as shownin the close-up in the lower right corner of the image.To test a given colormap for its discriminative power in the data rangearound the freezing point, we applied a threshold test with the options

Flat − Surrounding , m = − M =

53, and t =

0. First, we startwith a local uniform cool-warm colormap ( of Fig. 14). The relatedtest function visualization demonstrates that it is impossible todifferentiate between negative and positive values if the values are closeto 0 ◦ C. The Subtraction Field method of the test evaluation part(Sect. 5) yields a nearly white image, which reﬂects that the colormapuniformly represents the gradients produced by the test function. Tohighlight the freezing point in the mapping, we introduce a non-linearityin the colormap, at 0 ◦ C. We use the twin key option of the CCC-Toolcolormap speciﬁcation (CMS) [30], which separates the color key at0 ◦ C into a left and right color key to create the discontinuous transition.To improve the visual difference between both sides, we slightly lowerthe lightness value and increase the left color saturation to achieve lightblue. We kept white as the right-hand part of the color key. Panels and of Fig. 14 illustrates that the introduced discontinuity inthe colormap clearly separates the areas with negative and positivetemperature values. In comparison to , the Subtraction Field in shows with a vertical red line the spatial position of the discontinuoustransition at 0 ◦ C. The according visualization of the temperature ﬁeldof the modiﬁed colormap is shown in panel .If we visualize the global 2m-temperature ﬁeld using a linear col-ormap and look at the tropics or the mid-latitudes, we ﬁnd that regionalvariations are also not very well resolved. Using the same colormap,Fig. 15 shows a different view onto our planet, as Fig. 14 . Theresolving power of the linear colormap is equally distributed over thefull data range. However, when we analyze the global temperaturedistribution, we ﬁnd that more than half of the data range is used for thetemperature variations far below 0 ◦ C mostly in Antarctica, althoughthis information is less important for most users of such a data set. Withrespect to vegetation and agriculture, we may want to put more focuson regions with temperatures mostly above 0 ◦ C.Therefore we extended the path of the colormap through the colorspace to get more distinguishable colors for the positive data range. Weused a

Little Bit test to control improvements during this process.Panel of Fig. 15 shows a visualization using the colormap with thediscontinuous transition introduced above. The corresponding LittleBit test is shown in panel . For the evaluation, we used the ColorDifference Field (Sect. 5). Panel shows how the small groovesin the linear gradient of the Little Bit test function (that are hardlynoticeable in ) become clearly visible in the color difference ﬁeld.From left to right, the regularly spaced perturbations in the ﬁeld increasein magnitude, which is represented by a stripe pattern in panel thatincreases in contrast from left to right. The vertically constant color ofthe stripe pattern is a direct consequence of a linear colormap.However, as we wanted to increase the discriminative power in theupper part of the colormap, we inserted additional color keys. First,we moved the blue part of the colormap representing negative valuesslightly away from cyan. The freed color space was utilized to repre-sent the lower positive temperatures. A gradient from white to cyan0 ◦ C-10 ◦ C is followed by a gradient from cyan to green to represent the moderate temperature range of 0 ◦ C-20 ◦ C. Next to this, a subsequentgradient from yellow through beige to light brown shows values be-tween 20 ◦ C and 40 ◦ C. A further transition to dark red ﬁnally showshigher temperature range of up to 53 ◦ C.Our colormap semantics were designed to roughly differentiate be-tween ﬁve temperature zones: very cold (blue to light blue), moderatelycool (white to cyan), moderately warm (cyan to green), warm (greento yellow to beige) and hot (red). Concerning red-green colorblindviewers, we used a lower and not overlapping lightness range for thered gradient and the green gradient. The respective color gradientswere separately optimized for local uniformity. The panels and of Fig. 15 show the visualizations of the temperature data and the testfunction with the modiﬁed colormap. Note that we used the LittleBit test function only for the upper part of the colormap that corre-sponds to temperature values between m = ◦ C and M = ◦ C. As aresult of our modiﬁcations of the colormap, it is now possible to seemuch more detail in the inhabited part of our planet and to distinguishbetween the different temperature zones. Compared to , the ColorDifference Field shows an increase in the color difference atthe expense of the local uniformity of the positive data range. ONCLUSION

In this paper, we have introduced the approach of using test functionsas a standard evaluation method, and we have presented a test suitefor continuous colormaps. Like in other ﬁelds of computer science,one could use such test functions besides user-centered evaluation(e.g., user testimonies and empirical studies). In compassion withuser-centered evaluation, there is no need to recruit participants, designquestionnaires or stimuli, organize payment, arrange experiment timeand environment, and provide apparatus. Evaluating colormaps usingthe test suite can be conducted quickly and easily. The designer cantest many optional colormaps against many test functions and data sets,which is usually not feasible with user-centered evaluation. The sametests can be repeated with consistent control and comparability.For the test suite, we ﬁrst focused on the speciﬁc challenges ofscalar ﬁelds. The Sect. 4.1.1-4.2.2 describe the test functions we choseto address these challenges. To help users with a less mathematicalbackground, we tried to develop intuitive functions that are simpleand easy to interpret. The test suite currently includes step functions,different gradients, minima, maxima, saddle points, ridge and valleylines, global topology, thresholds, different frequencies, and a test forvery small value changes. Although these test functions cannot coverall possible challenges, we have laid down a solid foundation that canbe extended continually. We have also included the option to add noiseto extend the possibilities of the basic test functions.Besides our newly designed functions, we have presented inSect. 4.2.4 a collection of functions used for evaluation in other com-puter science ﬁelds. We think they will prove to be useful for theevaluation of colormaps as well. Furthermore, we have included aninitial selection of real-world data sets from different application areas.As described in Sect. 4.3, tests against real-world data are importantin practice. Each real-world data set in our test suite presents an indi-vidual challenge of a combination of in scalar ﬁeld analysis. Here, ourintention is to provide a broad cover such that users are less dependenton external data.Our test suite has been integrated into the open-access CCC-Tool. InSect. 5 we describe means to evaluate the results of the test functionsvisually and numerically that we have also implemented into our online-tool. An example of using the test suite to evaluate and enhance auser-designed colormap concerning a speciﬁc application problem isﬁnally presented and discussed in Sect. 6.For a long-term perspective, we plan to continue the extension ofour collection. One option for real world data would be an open sourcedatabase with a web interface and a link to our tool. In order to adoptthe test suite as a standard evaluation method, we would like to workon the method of automatic test reports, which can perform automaticanalysis of a colormap with a set of tests chosen by the user.

EFERENCES [1] H. E. Aguirre and K. Takadama, eds.

Proceedings of the Genetic andEvolutionary Computation Conference, GECCO 2018, Kyoto, Japan, July15-19, 2018 . ACM, 2018. doi: 10.1145/3205455[2] J. Ahrens, B. Geveci, and C. Law. 36 - paraview: An end-user tool for large-data visualization. In C. D. Hansen and C. R. Johnson, eds.,

VisualizationHandbook , pp. 717 – 731. Butterworth-Heinemann, Burlington, 2005. doi:10.1016/B978-012387582-2/50038-1[3] L. D. Bergman, B. E. Rogowitz, and L. A. Treinish. A rule-based toolfor assisting colormap selection. In

Proceedings of the 6th conference onVisualization’95 , p. 118. IEEE Computer Society, 1995.[4] J. Bernard, M. Steiger, S. Mittelst¨adt, S. Thum, D. Keim, and J. Kohlham-mer. A survey and task-based quality assessment of static 2D colormaps.In

SPIE/IS&T Electronic Imaging , pp. 93970M–93970M. InternationalSociety for Optics and Photonics, 2015.[5] D. Borland and A. Huber. Collaboration-speciﬁc color-map design.

IEEEComputer Graphics and Applications , 31(4):7–11, July 2011. doi: 10.1109/MCG.2011.55[6] D. Borland and R. M. Taylor II. Rainbow color map (still) consideredharmful.

IEEE computer graphics and applications , 27(2):14–17, 2007.[7] G. E. P. Box and M. E. Muller. A note on the generation of random normaldeviates.

The Annals of Mathematical Statistics , 29(2):610–611, 1958.[8] C. Brewer.

Designing Better Maps: A Guide for Gis Users . EnvironmentalSystems Research, 2004.[9] C. A. Brewer. Color use guidelines for mapping.

Visualization in moderncartography , pp. 123–148, 1994.[10] R. Bujack, T. L. Turton, F. Samsel, C. Ware, D. H. Rogers, and J. Ahrens.The good, the bad, and the ugly: A theoretical framework for the assess-ment of continuous colormaps.

IEEE Transactions on Visualization andComputer Graphics , 24(1):923–933, Jan 2018. doi: 10.1109/TVCG.2017.2743978[11] R. E. Burkard, S. E. Karisch, and F. Rendl. Qaplib – a quadratic assignmentproblem library.

Journal of Global optimization , 10(4):391–403, 1997.[12] H. Childs, E. Brugger, B. Whitlock, J. Meredith, S. Ahern, D. Pugmire,K. Biagas, M. Miller, C. Harrison, G. Weber, H. Krishnan, T. Fogal,A. Sanderson, C. Garth, E. W. Bethel, D. Camp, O. Rubel, M. Durant,J. Favre, and P. Navratil. Visit: An end-user tool for visualizing andanalyzing very large data.

High Performance Visualization-EnablingExtreme-Scale Scientiﬁc Insight , pp. 357–372, 10 2012.[13] M. Eisemann, G. Albuquerque, and M. Magnor. Data driven color map-ping. In

Proceedings of EuroVA: International Workshop on Visual Ana-lytics, Bergen, Norway . Citeseer, 2011.[14] H. Fang, S. Walton, E. Delahaye, J. Harris, D. A. Storchak, and M. Chen.Categorical colormap optimization with visualization case studies.

IEEETransactions on Visualization and Computer Graphics , 23(1):871–880,Jan 2017. doi: 10.1109/TVCG.2016.2599214[15] C. Floudas and P. Pardalos.

Handbook of test problems in local andglobal optimization . Nonconvex optimization and its applications. KluwerAcademic Publishers, 1999.[16] D. L. Gresh. Self-corrected perceptual colormaps. Technical report, IBM,2008. RC24542 (W0804-104).[17] C. Heine, H. Leitte, M. Hlawitschka, F. Iuricich, L. De Floriani,G. Scheuermann, H. Hagen, and C. Garth. A survey of topology-basedmethods in visualization.

Computer Graphics Forum , 35(3):643–667,2016.[18] M. Jamil and X.-S. Yang. A literature survey of benchmark functions forglobal optimization problems.

CoRR , abs/1308.4008, 2013.[19] A. D. Kalvin, B. E. Rogowitz, A. Pelah, and A. Cohen. Building perceptualcolor maps for visualizing interval data. In

Human Vision and ElectronicImaging V , vol. 3959, pp. 323–336. International Society for Optics andPhotonics, 2000.[20] G. Kindlmann, E. Reinhard, and S. Creem. Face-based luminance match-ing for perceptual colormap generation. In

Proceedings of the conferenceon Visualization’02 , pp. 299–306. IEEE Computer Society, 2002.[21] H. Levkowitz and G. T. Herman. The design and evaluation of color scalesfor image data.

IEEE Computer Graphics and Applications , 12(1):72–80,1992.[22] A. Light and P. J. Bartlein. The end of the rainbow? Color schemes forimproved data graphics.

Eos , 85(40):385–391, 2004.[23] M. R. Luo, G. Cui, and B. Rigg. The development of the cie 2000colour-difference formula: Ciede2000.

Color Research & Application ,26(5):340–350, 2001. [24] B. B. Mandelbrot. Fractal aspects of the iteration of z z(1- z) for complexand z.

Annals of the New York Academy of Sciences , 357(1):249–259,1980. doi: 10.1111/j.1749-6632.1980.tb29690.x[25] S. R. Marschner and R. J. Lobb. An evaluation of reconstruction ﬁlters forvolume rendering. In

Proceedings Visualization ’94 , pp. 100–107, 1994.[26] S. Mittelst¨adt, D. J¨ackle, F. Stoffel, and D. A. Keim. ColorCAT: Guided De-sign of Colormaps for Combined Analysis Tasks. In E. Bertini, J. Kennedy,and E. Puppo, eds.,

Eurographics Conference on Visualization (EuroVis) -Short Papers , pp. 115–119. The Eurographics Association, 2015.[27] S. Mittelst¨adt, A. Stoffel, and D. A. Keim. Methods for compensatingcontrast effects in information visualization. In

Computer Graphics Forum ,vol. 33, pp. 231–240. Wiley Online Library, 2014.[28] M. Molga and C. Smutnicki. Test functions for optimization needs. 2005.[29] K. Moreland. Diverging color maps for scientiﬁc visualization. In

Inter-national Symposium on Visual Computing , pp. 92–103. Springer, 2009.[30] P. Nardini, M. Chen, F. Samsel, R. Bujack, M. B¨ottinger, and G. Scheuer-mann. The making of continuous colormaps.

IEEE Transactions on Visual-ization and Computer Graphics , 2019. doi:10.1109/TVCG.2019.2961674.[31] K. Perlin. An image synthesizer.

SIGGRAPH Comput. Graph. , 19(3):287–296, July 1985. doi: 10.1145/325165.325247[32] K. Perlin. Improving noise.

ACM Trans. Graph. , 21(3):681–682, July2002. doi: 10.1145/566654.566636[33] S. H. Pizer, J. B. Zimmerman, and R. E. Johnston. Concepts of the displayof medical images.

IEEE Transactions on Nuclear Science , 29(4):1322–1330, 1982.[34] S. M. Pizer. Intensity mappings to linearize display devices.

ComputerGraphics and Image Processing , 17(3):262–268, 1981.[35] G. Reinelt. Tsplib – a traveling salesman problem library.

ORSA journalon computing , 3(4):376–384, 1991.[36] P. L. Rheingans. Task-based color scale design. In , pp. 35–43.International Society for Optics and Photonics, 2000.[37] P. K. Robertson and J. F. O’Callaghan. The generation of color sequencesfor univariate and bivariate mapping.

IEEE Computer Graphics andApplications , 6(2):24–32, 1986.[38] B. E. Rogowitz and A. D. Kalvin. The ”Which Blair Project”: a quickvisual method for evaluating perceptual color maps. In

Visualization, 2001.VIS’01. Proceedings , pp. 183–556. IEEE, 2001.[39] B. E. Rogowitz, A. D. Kalvin, A. Pelah, and A. Cohen. Which trajectoriesthrough which perceptually uniform color spaces produce appropriatecolors scales for interval data? In , pp.321–326. Society for Imaging Science and Technology, 1999.[40] B. E. Rogowitz, D. T. Ling, and W. A. Kellogg. Task dependence, veridical-ity, and preattentive vision: taking advantage of perceptually rich computerenvironments. In

Human Vision, Visual Processing, and Digital DisplayIII , vol. 1666, pp. 504–514. International Society for Optics and Photonics,1992.[41] B. E. Rogowitz and L. A. Treinish. Using perceptual rules in interactivevisualization. In

Human Vision, Visual Processing, and Digital Display V ,vol. 2179, pp. 287–296. International Society for Optics and Photonics,1994.[42] B. E. Rogowitz and L. A. Treinish. Data visualization: the end of therainbow.

IEEE spectrum , 35(12):52–59, 1998.[43] B. E. Rogowitz, L. A. Treinish, and S. Bryson. How not to lie withvisualization.

Computers in Physics , 10(3):268–273, 1996.[44] F. Samsel, M. Petersen, T. Geld, G. Abram, J. Wendelberger, and J. Ahrens.Colormaps that improve perception of high-resolution ocean data. In

Proc.ACM CHI (Extended Abstracts) , pp. 703–710, 2015.[45] F. Samsel, M. Petersen, T. Geld, G. Abram, J. Wendelberger, and J. Ahrens.Colormaps that improve perception of high-resolution ocean data. In

Proceedings of the 33rd Annual ACM Conference Extended Abstracts onHuman Factors in Computing Systems , CHI EA ’15, pp. 703–710, 2015.doi: 10.1145/2702613.2702975[46] F. Samsel, T. L. Turton, P. Wolfram, and R. Bujack. Intuitive Colormapsfor Environmental Visualization. In R. Bujack, A. Middel, K. Rink, andD. Zeckzer, eds.,

Workshop on Visualisation in Environmental Sciences(EnvirVis) , pp. 55–59. The Eurographics Association, 2017. doi: 10.2312/envirvis.20171105[47] P. Schulze-wollgast, C. Tominski, and H. Schumann. Enhancing visualexploration by appropriate color coding. In

Proceedings of InternationalConference in Central Europe on Computer Graphics, Visualization andComputer Vision (WSCG , pp. 203–210, 2005.[48] S. C. Shadden, F. Lekien, and J. E. Marsden. Deﬁnition and propertiesf lagrangian coherent structures from ﬁnite-time lyapunov exponentsin two-dimensional aperiodic ﬂows.

Physica D: Nonlinear Phenomena ,212(3-4):271–304, 2005.[49] S. Silva, J. Madeira, and B. S. Santos. There is more to color scales thanmeets the eye: a review on the use of color in visualization. In

InformationVisualization, 2007. IV’07. 11th International Conference , pp. 943–950.IEEE, 2007.[50] S. Silva, B. S. Santos, and J. Madeira. Using color in visualization: Asurvey.

Computers & Graphics , 35(2):320–333, 2011.[51] K. R. Sloan and C. M. Brown. Color map techniques.

Computer Graphicsand Image Processing , 10(4):297–317, 1979.[52] D. Stalling, M. Westerhoff, H.-C. Hege, et al. Amira: A highly interactivesystem for visual data analysis.

The visualization handbook , 38:749–67,2005.[53] B. Stevens, C. Acqistapace, A. Hansen, R. Heinze, C. Klinger, D. Klocke,H. Rybka, W. Schubotz, J. Windmiller, P. Adamidis, I. Arka, V. Bar-lakas, J. Biercamp, M. Brueck, S. Brune, S. Buehler, U. Burkhardt,G. Cioni, M. Costa-Suros, S. Crewell, T. Crger, H. Deneke, P. Friederichs,C. Henken, C. Hohenegger, M. Jacob, F. Jakub, N. Kalthoff, M. K¨ohler,T. van Laar, P. Li, U. L¨ohnert, A. Macke, M. Madenach, B. Mayer,C. Nam, A. Naumann, K. Peters, S. Poll, J. Quaas, N. R¨ober, N. Rochetin,L. Scheck, V. Schemann, S. Schnitt, A. Seifert, F. Senf, M. Shapkali-jevski, C. Simmer, S. Singh, O. Sourdeval, D. Spickermann, J. Strandgren,O. Tessiot, N. Vercauteren, J. Vial, A. Voigt, and G. Z¨angl. The addedvalue of large-eddy and storm-resolving models for simulating cloudsand precipitation.

Journal of the Meteorological Society of Japan. Ser. II ,advpub, 2020. doi: 10.2151/jmsj.2020-021[54] J. Tajima. Uniform color scale applications to computer graphics.

Com-puter Vision, Graphics, and Image Processing , 21(3):305–325, 1983.[55] Text REtrieval Conference (TREC). Data. https://trec.nist.gov/data.html , accessed in July 2020.[56] D. C. Thompson, J. Bennett, C. Seshadhri, and A. Pinar. A provably-robustsampling method for generating colormaps of large data. In

LDAV , pp.77–84, 2013.[57] C. Tominski, G. Fuchs, and H. Schumann. Task-driven color coding. In , pp. 373–380. IEEE, 2008.[58] H. Wainer and C. M. Francolini. An empirical inquiry concerning humanunderstanding of two-variable color maps.

The American Statistician ,34(2):81–93, 1980.[59] L. Wang, J. Giesen, K. McDonnell, P. Zolliker, and K. Mueller. Colordesign for illustrative visualization.

Visualization and Computer Graphics,IEEE Transactions on , 14(6):1739–1754, Nov 2008. doi: 10.1109/TVCG.2008.118[60] C. Ware. Color sequences for univariate maps: Theory, experiments andprinciples.

IEEE Computer Graphics and Applications , 8(5):41–49, 1988.[61] C. Ware, T. L. Turton, F. Samsel, R. Bujack, and D. H. Rogers. Evaluatingthe Perceptual Uniformity of Color Sequences for Feature Discrimination.In K. Lawonn, N. Smit, and D. Cunningham, eds.,

EuroVis Workshop onReproducibility, Veriﬁcation, and Validation in Visualization (EuroRV3) ,pp. 7–11. The Eurographics Association, 2017. doi: 10.2312/eurorv3.20171107[62] C. Ware, T. L. Turton, F. Samsel, R. Bujack, D. H. Rogers, K. Lawonn,N. Smit, and D. Cunningham. Evaluating the perceptual uniformity ofcolor sequences for feature discrimination. In

EuroVis Workshop onReproducibility, Veriﬁcation, and Validation in Visualization (EuroRV3) .The Eurographics Association, 2017.[63] Wikipedia. Semeval (semantic evaluation). https://en.wikipedia.org/wiki/SemEval , accessed in July 2020.[64] A. Zeileis, K. Hornik, and P. Murrell. Escaping RGBland: selectingcolors for statistical graphics.

Computational Statistics & Data Analysis ,53(9):3259–3270, 2009.[65] H. Zhang and E. Montag. Perceptual color scales for univariate andbivariate data display.

The Society for Imaging Science and Technology(IS&T) , 2006.[66] L. Zhou and C. Hansen. A survey of colormaps in visualization.