A Real-Time Spatial Index for In-Vehicle Units
AA Real-Time Spatial Index for In-Vehicle Units
Magnus Lie Hetland and Ola Martin Lykkja Norwegian University of Science and Technology, [email protected] Q-Free ASA, Trondheim, Norway, [email protected]
Abstract
We construct a spatial indexing solution for the highly constrainedenvironment of an in-vehicle unit in a distributed vehicle tolling schemebased on satellite navigation ( gnss ). We show that a purely functionalimplementation of a high-fanout quadtree is a simple, practical solutionthat satisfies all the requirements of such a system.
Open road tolling is an increasingly common phenomenon, with transponder-basedtolling proposed as early as in 1959 [1], and a wide variety of more complextechnologies emerging over the recent decades [2]. One of the more recentdevelopments is the use of satellite navigation ( gnss ), with geographical points andzones determining the pricing [see, e.g., 3]. In this paper, we examine the feasibilityof maintaining a geographical database in an in-vehicle unit, which can performmany of the tasks of such a location-based system independently.This is a proof-of-concept application paper. Our main contributions can besummed up as follows: ( i ) We specify a set of requirements for a spatial databaseto be used in the highly constrained environment of a real-time, low-cost in-vehicleunit in Section 2; ( ii ) we construct a simple data structure that satisfies theserequirements in Section 3; and ( iii ) we tentatively establish the feasibility of thesolution through an experimental evaluation in Section 4. In the interest of brevity,some technical details have been omitted. See the report by Lykkja [4] for moreinformation. The basic functionality of our system is to retrieve relevant geographic (i.e.,geometric) objects, that is, the tolling zones and virtual toll gantries that are withina certain distance of the vehicle. Given the real-time requirements and the limitedspeed of the available hardware, a plain linear scan of the data would be infeasibleeven with about 50 objects. A real map of zones and gantries would hold ordersof magnitude more objects than that (c.f., Table 3). This calls for some kind ofgeometric or spatial indexing [5], although the context places some heavy constraints
This paper was presented at the NIK-2018 conference; see . a r X i v : . [ c s . D S ] A ug n the data structure used. One fundamental consideration is the complexity of thesolution. In order to reduce the probability of errors, a simple data structure wouldbe preferable. Beyond simplicity, and the need for high responsiveness, we have arather non-standard hardware architecture to contend with.The memory of the on-board unit is assumed to be primarily flash memory withserial access. The scenario is similar to that of a desktop computer, where theindex would be stored on a hard drive, with a subset in RAM and the L2 cache.For an overview of the hierarchical nature of the memory architecture, see Fig. 1and Table 1. The serial nature of the memory forces us to read and write single bitsat a time in a page. A read operation takes 50 µ s. A write operation takes 1 ms, andmay only alter a bit value of 1 to a bit value of 0. A sector, subsector or page maybe erased in a single operation, filling it with 1-bits in approximately 500 ms. Oneimportant constraint is also that each page may typically only be erased a limitednumber of times (about 100 000), so it is crucial that our solution use the pagescyclically, rather than simply modifying the current ones in-place, to ensure wearleveling.Based on general needs of a self-contained tolling system, and on the hardwarecapabilities just described, we derive the following set of requirements: A. The system must accommodate multiple versions of the same index structureat one time. The information may officially change at a given date, but therelevant data must be distributed to the units ahead of time. B. It must be possible to distribute new versions as incremental updates, ratherthan a full replacement. This is crucial in order to reduce communication costsand data transfer times. It will also make it less problematic to apply minorcorrections to the database. C. The memory footprint must be low, as the memory available is highly limited. D. The database must maintain 100 % integrity, even during updates, to ensureuninterrupted access. It must be possible to roll back failed updates withoutaffecting the current database. E. The indexing structure must be efficient in terms of CPU cycles during typicaloperations. Both available processing time and energy is highly limited andmust not be wasted on, say, a linear scan over the data.
Sector k of k · · · Sector 2 of k Sector 1 of k Subsector 16 of 16 · · ·
Subsector 2 of 16Subsector 1 of 16 Page 16 of 16 · · ·
Page 2 of 16Page 1 of 1664 KiB 4 KiB 256 B k = 16 . . . Figure 1: Flash memory architectureable 1: Hierarchical memory architecture
Description Size Access time Persistent?
Processor internal memory Ki-bytes Zero wait state NoExternal RAM 100 KiB Non-zero wait states NoSerial Flash 8, 16 or 32 MiB See main text Yes F. The relevant data structure must minimize the number of flash page readsneeded to perform typical operations, especially for search. Flash-page readswill be a limiting factor for some operations, so this is necessary to meet thereal-time operation requirements. G. In order to avoid overloading individual pages (see discussion of memoryarchitecture, above), wear leveling must be ensured through cyclic use. H. The typical operations that must be available under these constraints are basictwo-dimensional geometric queries (finding nearby points or zones) as well asmodification by adding and removing objects.In addition, the system must accommodate multiple independent databases, suchas tolling information for adjacent countries. Such databases can be downloadedon demand (e.g., when within a given distance of a border), cached, and deletedon a least-recently-used basis, for example. We do not address this directly as arequirement, as it is covered by the need for a low memory footprint per database(Req. C).As in most substantial lists of requirements, there are obvious synergies betweensome (e.g., Reqs. E and F) while some are orthogonal and some seem to be in directopposition to each other (e.g., Reqs. A and C). In Section 3, we describe a simpleindex structure that allows us to satisfy all our requirements to a satisfactory degree.
The solution to the indexing problem lies in combining two well-known technologies:quadtrees and immutable data structures.In the field of geographic and geometric databases, one of the simplest and mostwell-known data structures is the quadtree, which is a two-dimensional extensionof the basic binary search tree. Just as a binary search tree partitions R into twohalves, recursively, the quadtree partitions R into quadrants. A difference betweenthe two is that where the binary search tree splits based on keys in the data set, thequadtree computes its quadrants from the geometry of the space itself. ∗ In order toreduce the number of node (i.e., page) accesses, at the expense of more coordinatecomputations, we increase the grid of our tree from 2-by-2 to 9-by-9. The specificchoice of this grid size is motivated by the constraints of the system. We need3 bytes to address a flash page and with a 9-by-9 grid, we can fit one node into9 · · ∗ Technically, this is the form of quadtrees known as PR Quadtrees [5, § f the resulting node sizes (area of ground covered), see Table 2. The last threecolumns show the number of leaf nodes at the various levels in the experimentalbuild described in Section 4.Immutable data structures have been used in purely functional programmingfor decades [see, e.g., 6], and they have recently become more well known to themainstream programming community through the data model used in, for example,the Git version control system [see, e.g., 7, p. 3]. The main idea is that insteadof modifying a data structure in place, any nodes that would be affected by themodification are duplicated. For a tree structure, this generally means the ancestornodes of the one that is, say, added. Consider the example in Fig. 2. In Fig. 2(a),we see a tree consisting of nodes a through f , and we are about to insert g . Ratherthan adding a child to c , which is not permitted, we duplicate the path up to theroot, with the duplicated nodes getting the appropriate child-pointers, as shown inFig. 2(b), where the duplicated nodes are highlighted. As can be seen, the old nodes(dotted) are still there, and if we treat the old a as the root, we still have access tothe entire previous version of the tree. Our experiments were performed with a data set of approximately 30 000 virtualgantries (see Fig. 3(a)), generated from publicly available maps [8]. The mapsdescribe the main roads of Norway with limited accuracy. Additionally, moredetailed and accurate virtual gantries were created manually for some locationsin Oslo and Trondheim. Fig. 4(a) shows the relevant virtual gantries in downtownOslo, used in our test drive. There are about 35 virtual gantries on this route, andmany of these are very close together. In general, there is one virtual gantry beforeevery intersection.Each local administrative unit ( kommune ) is present in the maps used [8], withfairly accurate and detailed boundaries. There are 446 such zones in total (seeFig. 3(b)). In addition, more detailed and accurate zones were created manually forsome locations in the cities of Oslo and Trondheim. Some of these are quite small,very close together, and partially overlapping (see Fig. 4(b)).
Data Structure Build
These test data were inserted into the quadtree as described in Section 3. Table 3summarizes the important statistics of the resulting structure. The numbers formemory usage are also shown in Fig. 5, for easier comparison.Table 2: Tree levels. Sizes are approximate
Level Size Zone VG Both
Top 2 . × m 0 0 01 2 . × m 15 0 42 2 . × m 625 168 4713 2 . × m 81 12 353 39 7214 3 . × m 0 157 4865 3 . × m 0 0 0 b cd e f g (a) Before insertion of g ab cd e f ga c (b) After insertion of g Figure 2: Node g is inserted by creating new versions of node a and c (highlighted),leaving the old ones (dotted) in place (a) VGs (b) Zones Figure 3: Virtual gantries and zones of Norway, with an illustrative quadtree grid (a) VGs (b) Zones
Figure 4: Virtual gantries and overlapping zones in downtown Osloable 3: Tree performance numbers
Description Zones VGs Both a Number of objects in database 448 29 037 29 485 b Flash pages for index and data 2164 42 217 71 675 c Size (MiB) 0 .
53 10 .
31 17 . d Flash pages for index 1716 13 180 42 190 e Objects referenced by leafs 2481 43 263 95 272 f Leaf nodes per object 5 . . g Leaf entries not used, empty 240 27 403 1319 h Leaf entries set 733 13 179 41 207 i Max index tree depth 3 4 4 j Zone inside entries 57 30 379 k Zone edge entries 2406 21 333 l Distinct leaf pages 578 11 608 14 138 m Total leaf pages 721 12 678 40 682 n Duplicate leaf pages 143 1070 26 544 o Flash pages, dups removed 2021 41 147 45 131 p Size, dups removed (MiB) 0 .
49 10 .
05 11 . q Index pages, dups removed 1573 12 110 15 646 r Size of index, dups removed (MiB) 0 .
38 2 .
96 3 . · F l a s hp ag e s Figure 5: The plot shows total flash pages used ( ) and flash pages used for theindex ( ), as well as the same with duplicates removed ( and , respectively)for a data base consisting of zones, virtual gantries, or both (c.f., Table 3)udging from these numbers (row r ), a database containing only zones would bequite small (about 0 .
38 MiB). Each zone is referenced by 5 . f ). Also note that only 57 leaf nodes (squares) are entirely contained in a zone(row j ). This implies that the geometric inside/outside calculations will need to becomputed in most cases.This can be contrasted with the combined database of gantries and zones. Theindex is larger (3 .
82 MiB, row r ), but the performance of polygon assessments ismuch better. Each polygon is referenced from 115 leaf nodes (row f ) and there aremore inside entries than edge entries (30 379 vs 21 333). This indicates that thegeometric computations will be needed much less frequently.Each of the three scenarios creates a number of duplicate leaf pages. Many leafpages will contain the same zone edge/inside information. In our implementationof the algorithm, this issue is not addressed or optimized. It is, however, quite easyto introduce a reference-counting scheme or the like to eliminate duplicates, in thisscenario saving 6 MiB (as shown in rows o through r ).The zone database contains very few empty leaf entries (row g ), because theunion of the regions covers the entire country, with empty regions found in the seaor in neighboring countries. Flash Access in a Real-World Scenario
The index was also tested in a 2 km drive, eastbound on Ibsenringen, in downtownOslo. The relevant virtual gantries are shown in the map in Fig. 4(a). The in-memory flash cache used was 15 pages (15 ·
256 B), and the cache was invalidatedbefore test start. Fig. 6 shows the result, in terms of flash accesses and the numberof gantries found.
To view our results in the context of the initial problem, we revisit our requirementlist from Section 2. We can break our solution into three main features: ( i ) Theuse of quadtrees for indexing; ( ii ) purely functional updates and immutability; and( iii ) high fanout, with a 9-by-9 grid. Table 4 summarizes how these features, takentogether, satisfy all our requirements. Each feature is either partly or fully relevantfor any requirement it helps satisfy (indicated by ◦ or • , respectively).Our starting-point is the need for spatial (two-dimensional) indexing (Req. H),and a desire for simplicity in our solution. The slowness of our hardware made astraightforward linear scan impossible, even with a data set of limited size. This ledus to the use of quadtrees, whose primary function, seen in isolation, is satisfyingReq. E, CPU efficiency. It also supports Req. C (low memory footprint) by givingus a platform for reducing duplication. Lastly, it supports efficiency in terms offlash page accesses (Req. F), which is primarily handled by high fanout, using thenine-by-nine grid.The purely functional updates, and the immutable nature of our structure,satisfies a slew of requirements by itself. Just as in modern version control systemssuch as Git [7], immutable tree structures where subtrees are shared between versionsgives us a highly space-efficient way of distributing and storing multiple, incrementalversions of the database (Reqs. A to C). This also gives us the ability to keep usingthe database during an update, and to roll back the update if an error occurs,igure 6: Flash access in an actual drive (Oslo Ring-1 Eastbound): flash pages read(vertical bars) and virtual gantries found (horizontal lines)without any impact on the database use, as the original database is not modified(Req. D). Finally, because modifications will always use new flash pages, we avoidexcessive modifiations of, say, the root node, and can schedule the list of free nodesto attain a high degree of wear leveling (Req. G).In our tests, as discussed in the previous section, we found that the solutionsatisfied our requirements not only conceptually, but also in actual operation. Itcan contain real-world data within real-world memory constraints (Table 3), andcan serve up results in real time during actual operation, with relatively low flashaccess rates (Fig. 6). We have described the problem of real-time spatial indexing in an in-vehicle satellitenavigation ( gnss ) unit for the purposes of open-road tolling. From this problemwe have elaborated a set of performance and functionality requirements. Theserequirements include issues that are not commonly found in indexing for ordinarycomputers, such as the need for wear-leveling over memory locations. By modifyingthe widely used quadtree data structure to use a higher fanout, and by making itimmutable, using purely functional updates, we were able to satisfy our entire listof requirements. We also tested the solution empirically, on real-world data and ina real-world context of a vehicle run, and found that it performed satisfactorily.Although our object of focus has been a rather limited family of hardwareTable 4: How components of the solution satisfy various requirements
Feature A B C D E F G H
The use of quadtrees for indexing ◦ • ◦ •
Purely functional updates and immutability • • • • •
High fanout, with 9-by-9 grid • rchitectures, the simple, basic ideas of our index chould be useful also for otherdevices and applications where real-time spatial indexing is required under somewhatsimilar flash memory conditions. Possible extensions of our work could be to test themethod under different conditions, perhaps by developing a simulator for in-vehicleunits with different architectures and parameters. This could be useful for choosingamong different hardware solutions, as well as for tuning the index structure anddatabase. Disclaimer & Acknowledgements
Magnus Lie Hetland introduced the maindesign idea of using immutable, high-fanout quadtrees for the database structure.He wrote the majority of the text of the current paper, based in large part on thetechnical report of Lykkja [4]. Ola Martin Lykkja implemented and benchmarkedthe database structure and documented the experiments [4]. Neither author declaresany conflicts of interest. Both authors have revised the paper and approved thefinal version. The authors would like to thank Hans Christian Bolstad for fruitfuldiscussions on the topic of the paper. This work has in part been financed by theNorwegian Research Council (BIA project no. 210545, “SAVE”).
References [1] Frank Kelly. Road pricing: addressing congestion, pollution and the financingof britain’s roads.
Ingenia , 29:34–40, December 2006.[2] Peter Hills and Phil Blythe. For whom the road tolls?
Ingenia , 14:21–28,November 2002.[3] Bern Grush. Road tolling isn’t navigation.
European Journal of Navigation , 6(1), February 2008.[4] Ola Martin Lykkja. SAVE tolling objects database design. Technical ReportQFR01-207-1492 0.7, Q-Free ASA, Trondheim, Norway, 2012.[5] Hanan Samet.
Foundations of Multidimensional and Metric Data Structures .Morgan Kaufmann, 2006.[6] Chris Okasaki.
Purely Functional Data Structures . Cambridge University Press,1999.[7] Jon Loeliger and Matthew McCullough.
Version Control with Git . O’Reilly,second edition, 2012.[8] Kartverket. N2000 kartdata. Available from