Non-Uniform Gaussian Blur of Hexagonal Bins in Cartesian Coordinates
NNon-Uniform Gaussian Blur of HexagonalBins in Cartesian Coordinates
Reinier Vleugels ∗ Department of Computer Science,IBIVU Centre for Integrative Bioinformatics,Vrije Universiteit AmsterdamandMagnus PalmbladCenter for Proteomics and Metabolomics,Leiden University Medical CenterMay 21, 2020
Abstract
In a recent application of the Bokeh Python library for visualizing physico-chemicalproperties of chemical entities text-mined from the scientific literature, we found our-selves facing the task of smoothing hexagonally binned data in Cartesian coordinates.To the best of our knowledge, no documentation for how to do this exist in the publicdomain. This short paper shows how to accomplish this in general and for Bokehin particular. We illustrate the method with a real-world example and discuss somepotential advantages of using hexagonal bins in these and similar applications.
Keywords:
Binning, blurring, Bokeh, histograms, tesselation ∗ The first author is an MSc student at the Department of Computer Science, IBIVU Centre for Inte-grative Bioinformatics, Vrije Universiteit Amsterdam. This work was conducted as part of his thesis. a r X i v : . [ c s . G R ] M a y Introduction
Hexagonal binning is a popular alternative for creating two-dimensional histograms thatcaptures most shapes better than the more common rectilinear binning methods (Carret al., 1987). Hexagons are more similar to circles than squares which means data isaggregated more tightly around the bin center. The minimum distance from a bin centerto the nearest bin border is ~ axial or trapezoidal coordinates used bythe Bokeh
Python library (Bokeh Development Team, 2019), the offset coordinates usedby the
TikZ Shapes L A TEX library (also to generate Figures 1-5 in this paper) and cubecoordinates with three axes. Other software, including the R tess package (H¨ohna, 2013),use a one-dimensional offset enumeration to address individual hexagons. In this paper wewill focus on offset and axial coordinates, though any hexagonal coordinates or numberingsystem can be transformed into Cartesian coordinates.It is sometimes advantageous to smooth images or histograms to reduce noise andemphasize features of interest. Blurring allows data visualization at a higher resolutionwithout requiring a large numbers of counts in each bin. Blurring can also be used torepresent uncertainly in measurements or predictions. When the hexagonal tiles representspatial data such as a map or image, blurring can be done in the hexagonal coordinatesystem, with the distance between two tiles being the minimum number of steps betweenthem. Each hexagon thus have 6 neighboring tiles with distance 1, 12 tiles of distance 2,18 of distance 3, etc - in general 6 n tiles in the n -th shell around a given tile (Figure 1).Blurs based on this distance metric are approximately radial.20,0)(0,1)(0,2) (1,0)(1,1) (2,1)(2,0)(0,-1)(0,-2) (1,-1)(1,-2) (2,-1)(-1,-1)(-1,-2)(-2,-1) (-1,0)(-1,1)(-2,1)(-2,0) xy Figure 1: Hexagonal tiles with offset hexagonal coordinates ( x, y ) in two shells around(0,0). The x axis is here drawn in the middle of the tiles with x = 0.However, in one of our use cases, based on previous work (Palmblad, 2019), the twodimensions do not represent spatial locations as in an map or image, but two differentphysical variables, with different underlying uncertainties. To smooth the histogram andvisualize these uncertainties, we have to use the original physical dimensions or coordinates.As we still wish to use hexagonal bins, for the reasons mentioned above, we have to evaluatethe smoothing function in the Cartesian coordinates of the bins, e.g. the bin centers. Notethat we do not wish to smooth the image . The hexagonal bins should remain distinct toallow the user to interact with the binned data. Here we show how this can be done, anddemonstrate the result using Bokeh. For a more comprehensive and interactive overviewof the many different hexagonal grid systems, see Amit Patel’s excellent blog post on thesubject (Patel, 2013). 30,0)(0,-1)(0,-2) (1,-1)(1,-2) (2,-2)(2,-1)(0,1)(0,2) (1,0)(1,1) (2,0)(-1,1)(-1,2)(-2,2) (-1,0)(-1,-1)(-2,0)(-2,1) qr Figure 2: Hexagonal tiles with axial (trapezoidal) coordinates ( q, r ) in the same two shellsaround (0,0). This is the hexagonal coordinate system used by Bokeh.
A hexagon can be seen as made up by six equilateral triangles. From elementary trigonom-etry, we know the ratio of the height to the side in an equilateral triangle is sin 60 ◦ or √ / √ √ / H ) coordinates(Figure 1) to Cartesian ( C ) ones are thus: 4 xy C = / √ xy H if x H ≡ / √ xy H + / C if x H ≡ ◦ between the axes, the transformation from the hexagonal coordinates q and r in Figure 2 to Cartesian coordinates is independent of coordinate parity: xy C = / −√ / −√ qr H (2)This transformation is illustrated in Figure 3 below. The Cartesian coordinates hererefer to the centers of the hexagonal bins:(0,0)(0,-1) (1,-1) (2,-1) (0,0)(0, √
3) ( , √ ) (3,0) √ q, r ) in axial coordinates are (3 q/ , √ q/ − r )). TheCartesian coordinates corresponding to ( x, y ) in offset hexagonal coordinates (Figure 1)are (3 x/ , √ y ) for even columns ( x mod 2 = 0) and (3 x/ , √ y + √ /
2) for odd ( x mod 2 = 1). 5s the two Cartesian dimensions in our use case represent different physical dimensionswith different uncertainties of measurement or estimation, the smoothing should not haverotational symmetry like a radial blur. If the errors are independent, the Gaussian kernelor signal S that should be transferred from (0 ,
0) to ( x, y ) in Cartesian coordinates is: S ( x, y ) = 1 σ x σ y √ π e − x y (cid:46) σ xσ y (3)where σ x and σ y are the standard deviations in the Cartesian coordinates correspondingto the two physical dimensions. We only need to evaluate S ( x, y ) once for any pair of σ x and σ y and in practise only for a limited number hexagonal bins (Figure 4): Figure 4: Gaussian blur of a single bin (red circle) in the case σ x = 2 and σ y = 1 in units ofhexagon side lengths. The numbers represent the transferred signal relative to the blurredvalue of the original bin (100%). All other bins are below 0.1% in this example.We implemented a generic function combining Equations 2 and 3 for creating a blurredhistogram data matrix in Python as part of SCOPE (Search and Chemical Ontology Plot-ting Environment). The SCOPE project includes tools for executing literature searches,collecting text-mined chemical entities of biological interest and creating interactive visu-alizations. The histograms were constructed using pandas and visualized by Bokeh version6.4.0. Bokeh is particularly popular for creating interactive visualizations for web browsersthat scales well to large datasets, including the millions of named entities that can beretrieved from a single literature search. The Bokeh interface in SCOPE includes slidersto adjust the blurring and scaling, and displays the dominating chemical classes in anyselected bin. 7 Verifications
To visually verify that the Gaussian blurring does what it was designed to do, we can lookat the effect of varying σ x and σ y in the kernel (Eq. 3): Figure 5: Examples of a single blurred hexagonal bin using the numbers in Figure 4 (top)and with σ x = 4 and σ y = 2 (bottom), restricting the blur to the same cells. The colormapping in both cases is linear from white at 0% to black at 100% of the blurred value ofthe central bin.We tested our Gaussian blurring implementation for Bokeh using SCOPE and searchingEurope PMC for scientific papers mentioning particular techniques from analytical chem-istry in their section-tagged materials and methods sections (Palmblad, 2019). Hexagonal8ins more faithfully traced the shapes of log P /mass distributions of chemical classes andthe resulting histograms exhibited different distributions as expected from what is knownabout the applicability of the analytical techniques:Figure 6: Distributions of mass (Da) and polarity (log P ) of small molecules retrievedfrom papers in Europe PMC mentioning hydrophilic interaction chromatography (HILIC)in their methods section. The distributions are visualized by SCOPE before (left) andafter (right) applying a non-uniform Gaussian blur of size 2.5. The saturation was setto 2.5, term frequency-inverse document frequency normalization applied and the Viridiscolormap selected.The sliders for adjusting the blurring and saturation improve the user experience andhelp create clear visualizations. The blurring convey the uncertainty in the log P pre-dictions. We also found interactive bin selection to be easier in hexagonal bins than inequiareal rectangular bins. Other features and applications of SCOPE are beyond thescope of the this paper (no pun intended), but will be described elsewhere.9 Conclusion
In this short paper we have shown how to calculate and apply non-uniform Gaussian blursin Cartesian coordinates in hexagonal bins, and illustrated this with a real-world example.For simplicity, we evaluated the Gaussian kernel in the center of each hexagonal bin. It maybe argued that the average kernel value in the bin more accurate represents the underlyingdistribution. However, this would require numerical integration over all these hexagonalbins for what would likely be a very small visual or practical benefit. All source code isavailable on GitHub (ReinV/SCOPE) under the Apache 2.0 license.
References
Bokeh Development Team (2019).
Bokeh: Python library for interactive visualization .Bokeh Development Team. https://bokeh.org/ .Carr, D. B., R. J. Littlefield, W. L. Nicholson, and J. S. Littlefield (1987). Scatterplotmatrix techniques for large n.
Journal of the American Statistical Association 82 (398),424–436.H¨ohna, S. (2013). Fast simulation of reconstructed phylogenies under global time-dependentbirthdeath processes.
Bioinformatics 29 (11), 1367–1374.Palmblad, M. (2019). Visual and semantic enrichment of analytical chemistry literaturesearches by combining text mining and computational chemistry.
Analytical Chem-istry 91 (7), 4312–4316.Patel, A. (2013). Hexagonal grids blog.