[PDF] A fast and practical grid based algorithm for point-feature label placement problem

Abstract

Point-feature label placement (PFLP) is a major area of interest within the filed of automated cartography, geographic information systems (GIS), and computer graphics. The objective of a label placement problem is to assign a label to each point feature so as to avoid conflicts, considering the cartographic conventions. According to computational complexity analysis, the labeling problem has been shown to be NP-Hard. It is also very challenging to find a computationally efficient algorithm that is intended to be used for both static and dynamic map labeling. In this paper, we propose a heuristic method that first fills the free space of the map with rectangular shape labels like a grid and then matches the corresponding point feature with the nearest label. The performance of the proposed algorithm was evaluated through empirical tests with different data set sizes. The results show that our algorithm based on grid placement of labels is a useful, fast and practical solution for automated map labeling.

Full PDF

AA fast and practical grid based algorithm for point-feature label placement problem

Yasemin ¨Ozkan Aydın a, ∗ , Kemal Leblebicio˘glu b a School of Physics, Georgia Institute of Technology, Atlanta, USA b Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey

Abstract

Point-feature label placement (

PFLP ) is a major area of interest within the ﬁled of automated cartography, geographic informationsystems (

GIS ), and computer graphics. The objective of a label placement problem is to assign a label to each point feature so as toavoid conﬂicts, considering the cartographic conventions. According to computational complexity analysis, the labeling problemhas been shown to be NP-Hard. It is also very challenging to ﬁnd a computationally e ﬃ cient algorithm that is intended to be usedfor both static and dynamic map labeling. In this paper, we propose a heuristic method that ﬁrst ﬁlls the free space of the map withrectangular shape labels like a grid and then matches the corresponding point feature with the nearest label. The performance of theproposed algorithm was evaluated through empirical tests with di ﬀ erent dataset sizes. The results show that our algorithm based ongrid placement of labels is a useful, fast and practical solution for automated map labeling. Keywords:

1. Introduction

Visualization of information on graphical displays is a veryimportant task when producing user-friendly, informative maps.Labels are an essential part of the maps when identifying point(e.g., cities, towns, mountains), line (e.g., streets, rivers), or area(e.g., countries, oceans) features. Point-feature label placement(

PFLP ) is a challenging problem in the area of automated car-tography and geographic information systems (GIS). The aim isto place labels with a certain shape near to corresponding pointfeatures while considering cartographic rules such as [1, 2]; • The size of the labels must suitable the text written in it, • No overlaps with other labels or features, • The connection between label and its associated featureshould be clear, • The algorithm should be fast and accurate, • A label must be placed in the best possible location.Although humans are successful in overcoming the basiclabeling problems such as conﬂict and uncertainty, obtaining amap or drawing which has labels perfectly placed on it is verytime consuming and non-trivial to do manually. Therefore, de-veloping computer algorithms for automated label placementhas received much attention by scientists in a wide range ofﬁelds, particularly cartography, architecture, computational ge-ometry, image analysis, and navigation systems. ∗ Corresponding author

Email address: [email protected] (Yasemin ¨Ozkan Aydın )

To display information about objects in the interactive mapsuch as type of aircraft, name of buildings in a dangerous area,the type of military supplies that aircraft carry or to draw atten-tion to a hazardous area, labeling process must be done quicklyand automatically. Especially, in real-time applications whereusers can change the scale and viewpoint of the map contin-uously, run-time of the algorithm is a very critical factor thatshould be considered. In most algorithms a considerable amountof time is spent in detecting label-label or label-feature overlaps[3]. If too many objects are close in a screen, the labels causescluttering or some objects are not labeled properly. Rather thanproduce results that obey all good labeling steps, our goal isto guarantee that all objects are labeled adequately in the map.The major limitation of the present study is that all labels shouldhave common size and type.Simulated annealing ( SA ) is the most commonly used car-tographic labeling algorithm. It is an energy based iterative andstochastic global searching algorithm [4, 5]. Genetic algorithm( GA ) has been applied to solve various optimal problems. Ithas been shown that SA and GA exhibit the best performancein terms of non-conﬂict labeling point ratio, but SA producesa faster solution than GA when the node number is increased[6, 7, 8].Generally, the cartographic labeling algorithm consists ofthree subtasks; (1) label candidate position selection, (2) costevaluation, (3) label assignment [9]. The candidate label thattouch the point feature can be placed at an 1, 2, 4 or 8 ﬁxed posi-tion or moved continuously around the node. After all candidatelabel positions are deﬁned, the conﬂict graph is obtained basedon overlaps between the labels and nodes. The optimizationalgorithms or heuristic methods then ﬁnd the best label con- Preprint submitted to Computer Science December 19, 2017 a r X i v : . [ c s . C E ] D ec guration with a minimum overlap considering cartographicpreferences. If the algorithm can not obtain a result withoutconﬂict, some labels can be removed. The time required to se-lect candidate label positions speciﬁes the quality (computationtime) of the algorithm. Our algorithm ﬁrst ﬁlls the free spaceof map with an evenly-spaced axis parallel rectangular labels.This gives a conﬂict-free candidate label set ( CLS ) and if thenumber of label in

CLS is greater than or equal to node numberthen all nodes can be labeled without any label-label or label-node overlap. The main contribution of our paper is to solve theconﬂict problem at the phase of selection of candidate label setwhere other algorithms in the literature solve it after obtaininga candidate label set.If the density of the points on the map to be labeled does notallow to place labels without conﬂict, a leader line may be usedto show the correspondence between point and label [4, 10, 11].In this case, labels are placed away from the point and a straightline or a combination of parallel and orthogonal lines connectspoint to label [11, 12, 13]. The objective is to ﬁnd a minimumlength leader without overlap [13, 14, 15, 16]. The length of theleader is important, since the shorter the leader line the smalleris the probability of two lines intersecting [17]. The ports whereleaders touch labels may be prescribed or may be arbitrary [].Most of the studies draw a frame around the map, and placethe labels outside of this frame by either one [11, 12, 18], twoor four side [13, 15, 16], whereas our study allows the place-ment of leader-connected labels not only at the boundary butanywhere in the map where there is empty space.In this paper, we propose an e ﬃ cient and simple heuristicmethod that can be also used in real time applications and reporton a series of empirical tests to show its performance. The aimis to obtain the best label positions in a predeﬁned map withoutany overlap. The input of the system is n point features andcorresponding labels whose size are known. The outputs areplacement of labels in a map and connection of labels with as-sociated features with the shortest line. This simple algorithmcan be used in the ﬁeld of cartography, computational geome-try, or information visualization.

2. Grid Based Label Placement Algorithm

In this section, we introduce the terminology used through-out the paper and explain the details of grid based algorithmintended to be usedfor labeling of point features. The graphicalillustration of the labeling problem is given in Fig. 1. A leaderline (see Fig. 2) is used to show the correspondence betweenthe label and point feature. The graph boundaries are deﬁnedas ( D minx , D maxx ) and ( D miny , D miny ). We leave some distance be-tween labels and graph boundary to clearly identify labels fromthe edges of graph. Other terms used throughout the paper aregiven in Table 1.The input of the labeling problem consists of a set P = { p , p , ..., p n } ⊆ R of n randomly generated point features Screen safe distances (ssd)

Label safe distance (lsd)

Graph boundary

Rectangular labelPoint feature or Node

Leader min min ( , ) x y

D D min max ( , ) x y

D D max min ( , ) x y

D D max max ( , ) x y

D D w h Figure 1: The graphical illustration of the graph boundary and deﬁnitions usedin the paper. A leader line is used to create a visual connection between thelabel and its corresponding point feature. We left some space between the graphboundary and labels to reduce the ambiguity.

LABEL

Figure 2: The leader is connected to its corresponding label with the shortestline. If a point feature is in the ﬁlled area, a leader connects the point feature tothe nearest corner of its corresponding label. where p i = ( p i x , p i y ) , i = , , ..., n . Each point p i is associ-ated with an axis-parallel rectangular label l i of width w andheight h . L is the set of all label positions, L p is the set oftop-k closest labels of all point features to be labeled, L p i is theset of top-k closest labels of point feature p i of P . The task isto assign a label to each point feature in 2-dimensional spacefrom the set L . A label should be close to the point to which itbelongs, and should not overlap with other labels and graphicalfeatures. Additionally, the center of each label ( l i x , l i y ) in the set L must satisfy the constraints of the graph boundaries, D minx + w < l i x < D maxx − w D miny + h < l i y < D maxy − h . Di ﬀ erent from the algorithms [19, 20] that place a ﬁnitenumber of positions being tangential to the point feature orslider model [21] that allow any position on the edges of la-bel, our method is based on placing as many axis-parallel rect-angular labels of ﬁxed height and width as possible in a pre-deﬁned map without overlapping. The label placement is sim-ilar to asymmetric graph paper which has some space withineach division. The labeling process can be subdivided into threestages:1. Calculation of potential label positions,2. Ranking of the labels according to their distances to graph-ical features,3. Assignment of labels to the corresponding point features.The detailed algorithm for the grid based label placement isas follows;2 a)(b)Figure 3: (a) Potential label positions (b) The label position after assignment • Choose the set of randomly generated point features inthe plane (In a real time application they are obtainedfrom GPS data or user deﬁned) and place them in a map. • Place ﬁxed size rectangular type labels into the map side-by-side without overlap with other labels and nodes. Thelabels are arrayed in rows with some space between each other and positioned horizontally starting from the bot-tom left corner of the scene. In order to increase visibil-ity and clearness, we leave horizontal and vertical whitespace between labels, called label safe distance . If thehorizontal distance, x d , between the right edge of a la-bel and the corresponding point feature is LS D < x d < able 1: The meaning of the terms used in the paper. Point feature or Node

A graphical feature to be labeled

Leader

The shortest line that connects a label tothe corresponding point feature

Label closeness level

The closeness order of the nearest n th la-bel to the corresponding point Nearest Label Matrix (NLM)

An n-by-k matrix that stores number oftop-k closest labels of all point features

Label safe distance (LSD)

A default horizontal and vertical distancebetween labels

Screen safe distance (SSD)

A default distance between labels andscreen

Figure 4: The sweep-phase of the algorithm. Labels are arranged side by side sothat there is some space between them. The labels (dashed) located away fromthe node are shifted along the x-axis from left to right to their ﬁnal position(solid).

LS D + w , then the label is shifted along the x-axis until x d = LS D without overlapping other map features whichare in the vicinity of the point feature. We call this stepas sweep-phase of the algorithm (see Fig. 4 ). • Store the ( x , y ) coordinates of four corners of all labels inan m x4 matrix, where m is the label number. • Find the nearest corner of the each label for n nodes bycalculating the distance d i j = (cid:107) x i − x j (cid:107) , i = , ..., n and j = , ..., m (2)between each node and the four corners of all labels. • Find the top-k nearest labels of each node and store labelnumbers and their position in the

Nearest Label Matrix (NLM). • First of all, the labels in the ﬁrst row of NLM are as-signed the nodes. At the end of the each assignment, weremove the number of assigned label from all rows andcolumns of the NLM. If we have unlabeled nodes afterassignment of the ﬁrst nearest labels, we continue withsecond nearest labels. This procedure continues until allnodes have one label. When a label closeness level is thesame for more than one node, some leader lines can beoverlap with other labels. In the ﬁnal label assignment produced by our algortihm, each assigned label does notoverlap any other label or node. • After all nodes are connected with labels, the unused la-bels are erased and the rest are drawn on a screen with aleader line (see Fig. 2).

3. Results

We have implemented the algorithm in MATLAB (R2011a)and all tests were run on an Intel(R) Core (TM) i7-2630 QM2.00 Ghz CPU with 4 Gb of RAM. We randomly placed n nodeson a region of size 3000 by 4000. For the experiments, labelsare axis-parallel rectangles and each graphical feature is asso-ciated with the same number of equal sized labels. In our im-plementation, the construction of an initial set L of label posi-tions, the calculation of label-node distance, and the formationof the matching have been produced according to the methoddescribed in Sec. 2. According to node number, the label sizecan be adjusted to speed up label placement phase. We ran twosets of experimentsi Label size is ﬁxed, we changed the node number,ii Labels are rectangle, in successive runs of the algorithm wechanged the height and width of the label.In the ﬁrst group of tests, we ﬁxed the label size ( w = , h = ﬀ ected by the particular distribu-tion of nodes, we conducted series of simulations with di ﬀ erentnode numbers. For each size of random datasets, we performed100 trials, and the results were averaged. As seen in Fig. 6, thelabel assignment procedure takes less than one minute for alldatasets and there is an almost linear relationship between la-bel number and run-time of the algorithm. The algorithm runsslower for smaller size labels because the initial set of label po-sitions is much larger for smaller size labels. A huge amountof time is spent in ﬁlling the free space of map with labels atthe beginning of the algortihm. Fig. 5 shows screen-shot of theﬁnal label assignments for di ﬀ erent node numbers. Once thesize of the labels increases above a certain threshold, the label-ing quality decreases quickly since the label-leader line overlapincreases. It will be an interesting problem to ﬁnd e ﬃ cient tech-niques that detect overlaps of labels with leader lines.We have also looked at the relationship between node num-ber and time percentage of the three stages of the labeling algo-rithm that are given in Sec. 2. In the experiments we see that thetime percentage ( t ) of the third part (assignment of labels to thecorresponding point features) is very small (less than 1%) com-pared to time percentage of other parts (calculation of potentiallabel positions t and obtaining the label-to-node distance ma-trices t ) and can be neglected. To understand how t and t change with the node number, for each node set ( n =

50 to 500with an increment of ten) we performed 200 trials keeping con-stant the size of the labels and map and averaged the results.4 a) (b) (c)(d) (e) (f)Figure 5: The ﬁnal label position of the randomly generated map with a node number (a) n =

25, (b) n =

50, (c) n = n = n = n = Number of Nodes M ean R un T i m e ( s e c ) Figure 6: Results of empirical testing of the algorithm on randomly generatedmap data (label size is 150x100 units, map size is 3000x4000 units).The verticalaxis shows the CPU time (in seconds) of algorithm for di ﬀ erent node number.The results are averaged over a hundred trial.

40 60 80 10002040600200400600 t t N od e N u m b e r Figure 7: The time percentages ( t , t ) of algorithm steps for node numbers 50to 500 with an increment 10. t is the percentage of label placement time, t isthe percentage of label-to-node distance calculation time. As seen from the Fig. 8, the increment in the number of nodesreduces t and increases t since the relation between the num-ber of nodes and labels placed on an empty space on the map isnot linear. For example a tenfold increase in the node numberreduces the area for placing labels only about 15%.We are also interested in how the total run-time of the algo-rithm has been a ﬀ ected by node number and label size. Fig. ?? shows the running time of the algorithm where the height andwidth of the labels are represented in the x-y direction. We in-creased h starting from 50 to 150, and w starting from 130 to200 with an increment of 10. We performed 20 trials for eachlabel size and we repeated this for n = , , ,

250 and5 t t N od e N u m b e r Figure 8: The time percentages ( t , t ) of algorithm steps for node numbers 50 to500 with an increment 10. t is the percentage of label placement time, t is thepercentage of label-to-node distance calculation time.

20 40 60 80 1000204060800.050.10.150.20.25 t t L a / M a % n=50n=100n=150n=250n=500 Figure 9: The relation between the ratio of label area ( L a ) to map area ( M a ) andtime percentages ( t , t ) of algorithm steps for node number 50 , , ,

250 and500. t is the percentage of label placement time, t is the percentage of label-to-node distance calculation time.Table 2: The relation between label size L a and time percentage of algorithmwhen the label size is increased ﬁve-fold. t and t represent the initial per-centages of time. ∆ t t % ∆ t t % L a x5 500 100% ↑ ↓

250 32% ↑ ↓

150 18% ↑ ↓

100 10% ↑ ↓

50 1% ↑ ↓ t , and decreases t but theamount of change is not the same for all sets. For example,when n =

500 500, a ﬁve-fold increase in label size increases t about 100%, decreases t about 40% but for n =

50 these val-ues are 1% and 33%, respectively. Values for other number ofnodes are summarized in Table 2. We conclude that the perfor-mance of the labeling algorithm is much more sensitive to labelsize when we increase the number of nodes.The comparison of our algorithm’s performance in terms ofaccuracy and computing time are di ﬃ cult since most of previ-ous approaches assume that the label touches its correspond-ing feature and measured the e ﬃ ciency of the algorithm by thenumber of features labeled in the ﬁnal solution without consid-ering time or just consider the algorithm’s speed without max-imizing the number of labeled features. Furthermore, they donot specify the speed and properties of the system that their al-gorithm was run on, which is the most important comparison criteria.

4. Conclusion

The method can be used to label any feature-based graphs(e.g., data points). We should implement some of the more im-portant labeling rules set forth by [1] and [2] in order to appealto a wide audience. For example, while penalties for overlapsare included in the algorithm, there is no term correspondingto the spacing between labels, which may be important for vi-sual aesthetics. In addition, the labels are horizontally alignedand cannot be tilted. Although this is the case for the majorityof labeling problems, there are graphs where a di ﬀ erent orien-tation of the label might be useful (i.e., labeling the di ﬀ erentfunctional dependence of a time-series graph). Implementingsuch additional features and rules can be an important directionfor this work. Currently, the algorithm supports labeling pointfeature graphs.

5. Acknowledgment

This research was partially supported by ATOS IT Consult-ing Customer Service Industrial Trade Co.,Turkey in the scopeof AIRC2IS R&D / Software Development Project.

6. References [1] E. Imhof, Positioning names on maps, Cartography and Geographic In-formation Science (1975) 128–144.[2] P. Yoeli, The Logic of automated map lettering, The Cartographic Journal9 (1972) 99–108.[3] K. G. Kakoulis, I. G. Tollis, Algorithms for the multiple label placementproblem, Computational Geometry 35 (3) (2006) 143 – 161, ISSN 0925-7721, doi:http: // dx.doi.org / / j.comgeo.2006.03.005.[4] S. Zoraster, Practical results using simulated annealing for point fea-ture label placement, Cartography and Geographic Information Systems24 (4) (1997) 228–238, doi:10.1559 /

5] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Optimization by simulatedannealing, SCIENCE 220 (4598) (1983) 671–680.[6] F. Hong, L. Kaijun, Z. Zuxun, An e ﬃ cient and robust genetic algorithmapproach for automated map labeling, in: Proceedings of the ISPRS Con-ference, Part B4, Istanbul, Turkey, 617–622, 2004.[7] G. R. Raidl, A Genetic Algorithm for Labeling Point Features, 1998.[8] S. van Dijk, D. Thierens, M. de Berg, Using Genetic Algo-rithms for Solving Hard Problems in GIS, GeoInformatica 6 (4)(2002) 381–413, ISSN 1573-7624, doi:10.1023 / A:1020809627892, URL http://dx.doi.org/10.1023/A:1020809627892 .[9] S. Edmondson, J. Christensen, J. Marks, S. Shieber, A general carto-graphic labelling algorithm, Cartographica 33 (4) (1996) 13–24, doi:10.3138 / U3N2-6363-130N-H870.[10] I. Vollick, D. Vogel, M. Agrawala, A. Hertzmann, Specifying labellayout style by example, in: Proceedings of the 20th Annual ACMSymposium on User Interface Software and Technology, UIST ’07,ACM, New York, NY, USA, ISBN 978-1-59593-679-0, 221–230, doi:10.1145 / ﬀ , Boundary la-beling: Models and e ﬃ cient algorithms for rectangular maps, Com-putational Geometry 36 (3) (2007) 215 – 236, ISSN 0925-7721, doi:http: // dx.doi.org / / j.comgeo.2006.05.003.[14] M. A. Bekos, M. Kaufmann, A. Symvonis, E ﬃ cient labeling of collinearsites, Journal of Graph Algorithms and Applications 12 (3) (2008) 357–380, doi:10.7155 / jgaa.00170.[15] M. A. Bekos, M. Kaufmann, M. N¨ollenburg, A. Symvonis, Boundary la-beling with octilinear leaders, Algorithmica 57 (3) (2009) 436–461, ISSN1432-0541, doi:10.1007 / s00453-009-9283-6.[16] P. Kindermann, B. Niedermann, I. Rutter, M. Schaefer, A. Schulz,A. Wol ﬀ , Multi-sided boundary labeling, Algorithmica (2015) 1–34ISSN1432-0541, doi:10.1007 / s00453-015-0028-4.[17] E. Wang, A D3 plug-in for automatic label placement using simulatedannealing, 2013.[18] M. Benkert, H. Haverkort, M. Kroll, M. N¨ollenburg, Graph Drawing:15th International Symposium, GD 2007, Sydney, Australia, September24-26, 2007. Revised Papers, chap. Algorithms for multi-criteria one-sided boundary labeling, Springer Berlin Heidelberg, Berlin, Heidelberg,243–254, 2008.[19] J. Christensen, J. Marks, S. Shieber, An empirical study of algorithmsfor point-feature label placement, ACM Trans. Graph. 14 (3) (1995) 203–232, ISSN 0730-0301.[20] M. Yamamoto, G. C¢mara, L. A. N. Lorena, Tabu search heuristic forpoint-feature cartographic label placement, GeoInmatica 6 (1) (2002) 77–90.[21] M. van Kreveld, T. Strijk, A. Wol ﬀ , Point labeling with sliding labels,Computational Geometry 13 (1) (1999) 21 – 47, ISSN 0925-7721., Point labeling with sliding labels,Computational Geometry 13 (1) (1999) 21 – 47, ISSN 0925-7721.