[PDF] Detection of Binary Square Fiducial Markers Using an Event Camera

Abstract

Event cameras are a new type of image sensors that output changes in light intensity (events) instead of absolute intensity values. They have a very high temporal resolution and a high dynamic range. In this paper, we propose a method to detect and decode binary square markers using an event camera. We detect the edges of the markers by detecting line segments in an image created from events in the current packet. The line segments are combined to form marker candidates. The bit value of marker cells is decoded using the events on their borders. To the best of our knowledge, no other approach exists for detecting square binary markers directly from an event camera using only the CPU unit in real-time. Experimental results show that the performance of our proposal is much superior to the one from the RGB ArUco marker detector. The proposed method can achieve the real-time performance on a single CPU thread.

Full PDF

DDetection of Binary Square Fiducial Markers Using an Event Camera

Hamid Sarmadi , Rafael Muñoz-Salinas , Miguel A. Olivares-Mendez , and RafaelMedina-Carnicer Computing and Numerical Analysis Department, University of Córdoba, Spain The Maimonides Biomedical Research Institute of Córdoba (IMIBIC), Spain Space Robotics Research Group, Interdisciplinary Centre for Security, Reliability and Trust (SnT),Université du Luxembourg, Luxembourg

Abstract

Event cameras are a new type of image sensors that outputchanges in light intensity (events) instead of absolute in-tensity values. They have a very high temporal resolutionand a high dynamic range. In this paper, we propose amethod to detect and decode binary square markers usingan event camera. We detect the edges of the markers bydetecting line segments in an image created from eventsin the current packet. The line segments are combined toform marker candidates. The bit value of marker cells isdecoded using the events on their borders. To the bestof our knowledge, no other approach exists for detectingsquare binary markers directly from an event camera. Ex-perimental results show that the performance of our pro-posal is much superior to the one from the RGB ArUcomarker detector. Additionally, the proposed method canrun on a single CPU thread in real-time.

Event cameras or “silicon retina” cameras [25] are a newtype of image sensors with a fundamentally diﬀerent ap-proach to sensing. The name silicon retina comes fromthe similarity of these cameras to the retina in the humaneye in the way they sense images. Instead of capturingabsolute values for each pixel in the image, they sense thechange in brightness at each pixel. This change in bright-ness is then compared to a threshold and if it is greater, anevent for that pixel is produced and communicated. Posi-tive changes in brightness produce the so-called on eventsand negative changes in brightness produce the so-called oﬀ events. Because of the asynchronous nature of thesecameras, they are able to produce events with a very hightemporal resolution (e.g. 1 microsecond) [16]. Another ad-vantage of these cameras is their very high dynamic rangewhich makes them suitable in situations with a very low orvery high amount of illumination or with a high contrastof brightness in the ﬁeld of view. They also tend to con-sume less energy compared to normal image sensors sincethey do not need to send a value for every pixel at regularintervals [1].All these advantages have resulted in a lot of interest indeveloping diﬀerent computer vision algorithms for eventcameras [24, 27, 49, 4, 16]. Although the currently avail- able event cameras are more expensive and have relativelylower resolution compared to conventional counterparts,it is expected that they will become more available withhigher resolutions in the future with companies like Sam-sung [44] investing in their mass production.On the other hand, binary square ﬁducial markers are apopular technology for pose estimation in augmented re-ality and other applications. There are many examplesof them such as IGD [51], Matrix [36], binARyID [15],ARToolKitPlus [50], and ARTag [13]. One of the mostrecent and most widely used versions is the ArUco marker[19] which has been employed in important applicationssuch as medicine and robotics [9, 41, 45, 42, 3, 38]. TheArUco markers are also used in more fundamental com-puter vision problems such as camera calibration, environ-ment mapping, and SLAM [40, 31, 32]. All binary squareﬁducial markers have a system of ID codes which helps toidentify unique markers within each type. These codes arepresented on a square cell grid with black and white colorsrepresenting zeros and ones. This visual representation iscommon among all binary square markers.In this paper, we propose a method to detect and de-code square binary markers using an event camera. Thefundamental concept which we employ is that when a pixelmoves from a white to a black binary grid cell it produces oﬀ events and when it moves from a black to white gridcell produces on events. Since binary square ﬁducial mark-ers also have a black border around their code grid it isalso possible to ﬁnd the edge of the marker that is in thesame direction as the marker’s movement. This is done byﬁnding a line segment of oﬀ events and correspondingly aline segment of on events on the opposite edge.To the best of our knowledge, this is the ﬁrst work thatattempts to decode ﬁducial planar markers in event cam-eras using only the CPU unit in real-time. It is possible toconvert the events from the camera to an intensity imageemploying one of the recent methods [6, 29, 4] and then usethe RGB marker detection algorithms. This has been donein [22] employing the intensity image reconstruction algo-rithm introduced in [35]. However, these reconstructionmethods either have a considerable amount of inaccuracyin intensity image reconstruction or they need a power-ful dedicated GPU to perform their computations. Onthe other hand, there is another CPU-based method [33]for detecting and decoding markers in event cameras that1 a r X i v : . [ c s . C V ] J a n hiteSurroundingBorder CellsInner Cells Figure 1: The structure of an ArUco marker. The markeris printed on a white background. The border cells ofthe marker are all black and the inner cells are used tokeep the identiﬁcation code using black and white colors.Please note that the red lines are for visualization of thecell grid and are not part of the marker.works based on optimizing a generative model. However itcannot operate in real-time despite it using a much morepowerful hardware (10-Core CPU and 64 GB of RAM)than us. Our approach can run on a single CPU thread inreal time which makes it more accessible and also suitablefor low-powered devices.The rest of this paper is structured as follows. Section2 presents the related works, then in Section 3 the pro-posed approach is introduced in detail. Section 4 describesthe experimental results and discussion for validating ourmethod and ﬁnally Section 5 draws some conclusions andsuggests future work.

Square binary markers are ﬁducial markers that can beeasily detected in images and their pose can be eﬃcientlydetermine with respect to the camera given the size of thesquare marker. One of their most important applicationsis augmented reality [42, 47, 39] however they are alsoused in other fundamental computer vision problems suchas camera calibration [2, 14], 3D reconstruction [8, 37,41], and SLAM [30, 32]. Additionally, they are appliedto unmanned aerial vehicles [38, 3] medicine [41, 42], andswarm robotics [46, 11] amongst other applications.One of the ﬁrst works to design a marker with binarysquares for coding and a black edge for easy detectionis Matrix [36]. To detect the marker, they ﬁrst binarizethe image and then select marker candidate areas by con-nected component analysis. They then ﬁt a quadrangleto the area and using its four corners, the pixels are pro-jected to a code frame. The code is then extracted bycounting the white and black pixels in each cell. Finally,a CRC-error test is applied to the extracted code to checkthe validity of the marker. In another work by Fiala called ARTag [12] marker candidates are detected by ﬁnding linesegments in the extracted edge image. Where the linessegments form a quadrilateral a candidate is determined.For binarization, they use a threshold obtained from theedges of the marker. They also use CRC code correctionto verify the code from the code cells. The set of possiblemarkers is generated in a manner that minimizes the con-fusion between markers. In another method presented byFlohr and Fischer [15] a way of generating binary codesis introduced that does not confuse the markers with eachother when the marker is rotated. In a more recent ap-proach called ArUco by Garrido-jurado et. al. [18], aprobabilistic stochastic approach is employed to generatemarker codes to maximize their distances from each other.For detection, the marker candidates are determined bycontour extraction and then polygon approximation. Thefour-sided polygons are selected and a homography is cal-culated to transform them to the standard marker shape.For binarizing, an optimal bimodal image thresholding al-gorithm is employed. Finally, the extracted code from thecells is looked up in a dictionary to verify the marker ID.

Event cameras are a new type of camera that senseschanges in light intensity rather than their absolute value[25]. Since these sensors are fairly new, they are moreexpensive and have limited resolution compared to regu-lar RGB cameras [16]. However, because of their diﬀerentdesign, they are good at sensing with high temporal resolu-tion and high dynamic range which makes them appropri-ate for outdoor and industrial applications. Researchershave already attempted to use these cameras for commoncomputer vision problems which include object classiﬁca-tion [17], image deblurring [52], face detection [5], per-son tracking [34], mosaicing [23], and 3D reconstruction[24, 53].One of the earliest examples of object detection andtracking by event cameras is the work of Litzenberger et.al. [26]. They cluster the events in the image spatiallyand update the clusters when new events arrive. Thismethod is employed to track the cars on a highway. An-other early work [7] used for balancing a pencil on its tipestimates the line representing the pencil in the image asa Gaussian in Hough space. They take advantage of aquadratic representation which is the log of the Gaussian.At the arrival of each event, a quadratic related to itsposition is added to the Hough space while the previousquadratic is slightly decayed. Later, Schraml et. al. [43]presented a method that applied stereo matching to es-timate depth using two event cameras. They introduceda tracking algorithm based on ﬁnding bounding boxes foreach object where every bounding box contains events thatare spatially connected. Piatkowska et. al. in [34] tookadvantage of a Gaussian mixture model to track multi-ple people in the scene using the maximum a posterioriprobability estimation. After that, Reverter et. al. in[48] tracked an object by assigning bivariate Gaussians toits diﬀerent parts. The Gaussian trackers are restrictedby spring-like links to track the object as a whole. Theydemonstrate their results for tracking of the human face.2 ackground

Camera Frame

Figure 2: An example of the marker detectable by our al-gorithm in the camera frame. The marker is detectablewhen moving roughly parallel to its edges which is visual-ized by red and blue arrows on the marker above.In another work by Glover et. al. [21] a moving ball isdetected using a circular Hough transform that also in-tegrates optical ﬂow estimation to reduce false detectionsin a cluttered scene. Mitrokhin et al. [28] minimize thebackground noise to track an object by a moving camerathrough iteratively optimizing a motion model employingthe event count image and the event timestamp image.Up to our knowledge, there are no attempts on usingevent cameras to detect and decode binary planar mark-ers in real-time using only the CPU unit. There is aCPU based method presented in [33] which decodes QRcodes by optimizing a generative model. They ﬁrst esti-mate and initialize the motion and aﬃne transformationof the marker and then optimize these two parametersiteratively. At the end the QR code is optimized to bedecoded. Nevertheless, they cannot achieve real-time de-tection and decoding in spite of employing a much morepowerful hardware than us. On the other hand, one mightuse one of the methods for intensity image reconstructionfrom event cameras and then apply a normal marker detec-tion algorithm as is done in [22]. However, these methodshave some drawback that we explain below. The ﬁrst ex-ample is the work by Brandli et. al. [6]. They can get ahigh frame rate on CPU, however, their algorithm needsan intensity image of the ﬁrst frame. Furthermore, thereis a considerable amount of inaccuracy in their intensityreconstruction. Another method which creates fairly ac-ceptable reconstruction only taking advantage of eventsis presented by Munda et. al. [29]. Their reconstruc-tion is done on an event by event basis. They achievereal-time performance, however, they need to take advan-tage of GPU computing to achieve that. Bradow et. al.present an algorithm in [4] which can reconstruct opti-cal ﬂow as well as intensity image. They use a slidingwindow-based algorithm and can achieve near real-timeperformance. However, they need a dedicated GPU toperform their optimization.

We have developed our method using ArUco markers how-ever it is applicable to all binary square markers sincethey all contain a binary grid of black and white cells sur-rounded by black borders. You can see an example of anArUco marker and its cell grid in Figure 1. As can be ob-served, the cells on the edges of the grid are colored blackand the marker is supposed to be printed on a white back-ground to be easily detectable. The cells which are not onthe border can be colored black or white. A binary codeis associated to each marker by assigning 0 and 1 bits toblack and white inner cells and forming a binary numberby concatenating them. These binary codes come from adictionary with codes that have a high Hamming distancefrom each other to reduce their chance of being confused.In the process of detection, the binary code is extractedfrom the marker and looked up in this dictionary to checkif they are valid and determine their ID.Our algorithm is capable of detecting markers that movein front of an event camera in the directions that areroughly parallel to its edges as visualized in Figure 2.Event cameras send events by packets that contain eventsfrom a ﬁxed period of time, which is normally very short(e.g. 10 ms). You can ﬁnd a visual overview of our ap-proach in Figure 3. At the ﬁrst step, events from eachpacket are converted into two separate images, one for on and another for oﬀ events. After doing some preprocess-ing on these images we detect line segments in them thatcorrespond to the edges of the marker. Then we createmarker candidates by combining single line segments fromthe corresponding on and oﬀ images. After that, the pix-els within each candidate is unwarped to a standard squarerepresentation of the marker. We decode the marker byconvolving Gaussian ﬁlters in positions where we expectevents should exist to decode the color within each cell ofthe marker. Finally, the reconstructed colors are used toextract the binary code and it is compared to the codeswithin the marker dictionary to detect the ID number. The procedure of creating an image from on events and oﬀ events is identical and is done separately for each eventtype. Hence, to avoid repetition, the procedure is de-scribed without specifying the event type.Each event e i = ( x, y, t ) consists of three attributes, x and y are pixels coordinates and t is the time stamp. Be-cause the pixel at ( x, y ) can have several events in thepacket with diﬀerent timestamps, t is taken as the mini-mum timestamp among all events at ( x, y ) .Let us assume that E is the set containing all events ofone type in the current event packet: E = { e i | i = 1 . . . N } (1)where N is the number of events in E . We create an image I with the same resolution as the event camera for eachtype of event. Hence we deﬁne the set of all pixel locationsin the image by: G = { . . . W } × { . . . H } (2)3 ..... FormingCandidates Decoding byOur Method DictionaryLookupLine SegmentDetectionIn Event ImagesEvents From the Packet DetectedMarker

On Events ImageOff Events Image ...... ......

Figure 3: Overview of our marker detection algorithm from an event packet. Please note that the red color representsthe on events and line segment detections in the on events image, and blue color represents the same for oﬀ events.where W ∈ Z + is the width and H ∈ Z + is the heightof the image in pixels. We also deﬁne S as the set of thepixel coordinates where at least one event exists: S = { ( x, y ) ∈ G |∃ i ∈ { . . . N } : ( x, y, t i ) ∈ E } . (3)We call every pixel belonging to S a valid pixel and put I ( x, y ) = t i , ∀ ( x, y ) ∈ S. In order to simplify the timestamps, we normalize themaccording to the minimum and maximum timestamps inthe packet and put the values in the image I norm . We alsoset the non-valid pixels to zero: I norm ( x, y ) = (cid:40) I ( x,y ) − t max t max − t min , ( x, y ) ∈ S , ( x, y ) ∈ S = G \ S (4)where \ is the set diﬀerence operator and S is the set ofnon-valid pixels, and furthermore: t min = min { t i } Ni =1 and t max = max { t i } Ni =1 . (5)We were able to get better line segment detections (in-crease in marker detections by around 10% of total detec-tions) with − I norm values as opposed to I norm values,hence we deﬁne: I (cid:48) norm ( x, y ) = (cid:40) − I norm ( x, y ) , ( x, y ) ∈ S , ( x, y ) ∈ S (6)Next, we ﬁll the holes and reﬁne the events in I (cid:48) norm withthe following function: I ref ( x, y ) =  (cid:80) ( x (cid:48) ,y (cid:48) ) ∈ BS ( x,y ) I (cid:48) norm ( x (cid:48) ,y (cid:48) ) | B S ( x,y ) | ( x, y ) ∈ F x, y ) ∈ F (cid:48) I (cid:48) norm ( x, y ) ( x, y ) ∈ G \ ( F ∪ F (cid:48) ) (7) where: B ( x, y ) = (cid:26) ( x (cid:48) , y (cid:48) ) ∈ G (cid:12)(cid:12)(cid:12)(cid:12) x − ≤ x (cid:48) ≤ x + 1 y − ≤ y (cid:48) ≤ y + 1 (cid:27) (8)is the set of neighbors if pixels position ( x, y ) in G , B S ( x, y ) = B ( x, y ) ∩ S (9)is the set of neighbors at ( x, y ) which contain events, F = (cid:26) ( x, y ) ∈ S (cid:12)(cid:12)(cid:12)(cid:12) | B S ( x, y ) | > | B ( x, y ) | (cid:27) (10)is the set of pixel positions with no event that most oftheir neighbors have events, and: F (cid:48) = (cid:26) ( x, y ) ∈ S (cid:12)(cid:12)(cid:12)(cid:12) | B ( x, y ) ∩ S | > | B ( x, y ) | (cid:27) (11)is the set of positions with events that most of their neigh-bors have no events. After that, we smooth the I ref imageusing a 2D Gaussian ﬁlter, g n s × n s to obtain the I smooth image. Here, n s is the width and height of the Gaussianﬁlter in pixels which also has the standard deviation of σ s .The smoothing is done by the following function to reducethe noise from the sensor: I smooth ( x, y ) = (cid:40) ( I ref ∗ g )( x,y )( M ∗ g )( x,y ) , ( x, y ) ∈ S (cid:48) , ( x, y ) ∈ G \ S (cid:48) (12)Here, S (cid:48) = ( S ∪ F ) \ F (cid:48) , which is the set of pixels withvalid reﬁned events and M is an image mask deﬁne by: M ( x, y ) = (cid:26) x, y ) ∈ S (cid:48) G \ S (cid:48) . (13)In Equation 12, we divide the convolution of the normal-ized image by the convolution of the mask to eliminatethe eﬀect of non-valid pixels in smoothing the valid pixelvalues.As said before, the procedure mentioned so far is appliedseparately to on and oﬀ events to obtain two images. Thenext step is to detect line segments in these images.4 .2 Detection of Line Segments in EventImages Since binary square markers have a rectangular black mar-gin on a white background it is possible to detect theiredges in event images when they are moving. A line in oﬀ events image will appear on the edge on the direction ofmovement, and a line of on events will appear on the edgeon the opposite direction of movement. We take advan-tage of the LSD line segment detector introduced in [20].We chose this detector because it is faster and producesfewer and more accurate candidate than other methodssuch as the Hough Transformation [10]. We remove linesegments that are shorter than a minimum length, l min ,from the output of the LSD algorithm. This helps to re-move candidates that are too far away from the camera orhave too much of the perspective eﬀect to be decoded.Let us take L as the set of all detected valid line seg-ments: L = { l i } N L i =1 , l i = (cid:0) ( x i , y i ) , ( x i , y i ) (cid:1) (14)Here, ( x i , y i ) ∈ R and ( x i , y i ) ∈ R are the two ends ofthe line segment l i and are positioned within the bound-aries of the camera’s image resolution. We also deﬁne thefunction: length ( l i ) = || ( x i − x i , y i − y i ) || ,i = 1 . . . N L (15)which returns the length of the line segment l i . In order to have segments that correspond to the sametime in the packet’s time interval, we correct the positionof the line segments so that their occurrence falls in themiddle of the time interval. For this purpose we deﬁne ameasurement of their age which is the average normalizedtimestamp of the pixels that fall on the line segment. Moreformally the age of the line segment l = (( x , y ) , ( x , y )) is obtained as:age ( l ) = ¯ a, for all a ∈ A ( l ) (16)where A ( l ) = (cid:26) I norm ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12) M ( x, y ) = 1( x, y ) ∈ P ( l ) (cid:27) (17)and P ( l ) =  ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x = (cid:98) s x + x + 0 . (cid:99)∧ y = (cid:98) s y + y + 0 . (cid:99)∧ ≥ s x ≥ (cid:98) x − x + 0 . (cid:99)∧ ≥ s y ≥ (cid:98) y − y + 0 . (cid:99)  . (18)Here P ( l ) is the set of all pixels on the line segment l , A ( l ) is the set of pixel values on the segment in image I norm , ¯ a is the average value of all elements in A ( l ) , and s x and s y are the steps in x and y directions for moving on the linesegment.To make our marker decoding more robust we wouldlike our line segments to contain events that are in the middle of the packet’s time interval, i.e., we want the ageof the line segments to be equal to 0.5. However, our linesegment detector does not always return segments withsuch age. Hence we move the line segment perpendicularto its orientation in two directions until we ﬁnd the posi-tion with the right age (0.5). The formal deﬁnition of thisalgorithm is presented in Algorithm 1. Algorithm 1

Line Age Correction procedure

CorrectAge ( l = (cid:0) ( x , y ) , ( x , y ) (cid:1) ) (cid:126)u ← ( y − y , x − x ) (cid:126)u ← (cid:126)u/ || (cid:126)u || (cid:46) (cid:126)u : unit vector perpendicular to lR ← (cid:8)(cid:0) , age ( l ) (cid:1)(cid:9) for d ∈ {− , } do (cid:126)v ← d(cid:126)us ← l new ← ( s(cid:126)v + ( x , y ) , s(cid:126)v + ( x , y )) while | A ( l new ) | < | P ( l new ) | / do R ← R ∪ { ( sd, age ( l new )) } s ← s + 1 l new ← ( s(cid:126)v + ( x , y ) , s(cid:126)v + ( x , y )) end whileend for (ˆ α, ˆ β ) = arg min α,β (cid:32) (cid:80) ( x,y ) ∈ R ( αx + β − y ) (cid:33) (cid:46) α, β are parameters for linear regression over thetuples in Rt ← (0 . − ˆ β ) / ˆ α return ( t(cid:126)u + ( x , y ) , t(cid:126)u + ( x , y )) end procedure We correct the age of all detected line segments. Hencewe deﬁne: L corrected = { CorrectAge ( l i ) } N L i =1 (19)The next step is to use the line segments in L corrected toform candidates for marker detection. In order to detect candidates for marker detection we needto match line segments from on events to the ones from oﬀ events. We make some conditions to make it morelikely for the line segments to represent a valid markercandidate. Let us assume that L correctedon and L correctedoﬀ arethe corrected line segments related to on and oﬀ eventsrespectively such that: L correctedoﬀ = { l oﬀ i } N oﬀ L i=1 , L correctedon = { l on i } N on L i=1 (20)Then we form the set of our marker candidates, C , in thefollowing way: C =  ( l on i , l oﬀ j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) length ( l on i ) < × length ( l oﬀ j ) ∧ length ( l oﬀ j ) < × length ( l on i ) ∧ (cid:2) Project ( l on i , l oﬀ j ) ∨ Project ( l oﬀ j , l on i ) (cid:3) ∧ γ ( l on i , l oﬀ i ) ≤ π  (21)5 a) (b) (c) (d)(e)(f)(g)Marker Candidate On Events’s Image Unwarped Off Events’ Image Unwarped Convolution of Gaussian KernelsConvolution Responses (Off Events)Convolution Responses(On Events)Decoding Figure 4: Summary of our marker detection algorithm. First the marker candidate in (a) is unwarped to a standardsquare images related to on (b) and oﬀ (c) events. Then Gaussian kernels on a grid (d) are convolved with the imagesand the responses at convolution points are saved for on (f) and oﬀ (e) events. Finally, the responses are used toreconstruct the marker (g).where γ ( l on i , l oﬀ i ) is the minimum angle between the twoline segments l on i and l oﬀ i , and Project ( ., . ) is the functionthat takes two line segments and makes sure if at least oneof the end points of one segment projects within the endpoints of the other line segment:Project (( (cid:126)b , (cid:126)e ) , ( (cid:126)b , (cid:126)e )) =0 ≤ ( (cid:126)e − (cid:126)b ) .(cid:126)b || e − b || ≤ || e − b || ∨ ≤ ( (cid:126)e − (cid:126)b ) .(cid:126)e || e − b || ≤ || e − b || (22)Here ( (cid:126)b , (cid:126)e ) and ( (cid:126)b , (cid:126)e ) are, respectively, the coordinatevector pairs of the beginning and end points for the ﬁrstand the second input line segments to Project ( ., . ) . Nowthat we have formed the candidates for marker detection,we will attempt to decode them. The ﬁrst step in decoding a marker candidate is to unwarpit to a square image with standard dimensions. To decodethe candidates we use the image I norm which was deﬁnedin Section 3.1.To unwarp the candidates for decoding, we ﬁnd theperspective transformation that maps the corners of themarker in the I norm image to the corners of the target im-age with standard dimensions. We denote the unwarped image for the i -th candidate by I iC : I iC = UnwarpPerspective ( c i , s c , I norm ) , i = 1 . . . N C (23)Here, s c is the dimension of the standard square image inpixels, c i is the i -th marker candidate from C , and N C isthe number of marker candidates, hence: C = { c i } N C i =1 . (24)The unwarping is done in a manner that the edge detectedby the line segment from the oﬀ events is always on theleft and the one from on events is on the right. Thereforethe orientation of the unwarped marker would be as if itis always moving towards the left direction.You can see an example of unwarping the candidatefrom I norm in Figure 4(b) and 4(c) which are related to on and oﬀ events respectively. The polygon representing thecandidate is shown by white borders in Figure 4(a) wherethe I norm image of on events (in red) and the I norm imageof oﬀ events (in blue) are shown overlaid on top of eachother.After obtaining the unwarped standard candidate im-ages, they are segmented into square regions (inner cells)according to ArUco’s speciﬁcation. For each of these cells,we need to determine if the color inside is white or black.Since in the standard unwarped images the marker is al-ways supposed to move to the left, we check the eventson the left side of each cell. This is done by convolv-ing a 2D Gaussian kernel (with the standard deviation of σ d ) on each square cell’s left edge. The grid used for de-coding along with the positions where the Gaussians are6 onvolution Responses: r ij (Off Events)(On Events) Normalized Thresholded Responses: f ij

0 1 0 0 1 0 100 1 0 1 0 01 0 0 0 1 01 0 0 0 0 01 0 0 0 1 00 1 0 1 0 10 0 0 1 0 00 0 1 0 1 00 0 1 0 0 10 1 0 0 0 00 0 0 1 0 00 0 1 0 1 0 0 0 1 1 0 1 1Decoded Cells: b i,j

0 0 1 0 1 0 00 1 1 0 0 1 00 1 0 0 0 0 00 1 1 1 0 1 10 0 1 0 1 0 11 2 3 4 5 6j 1 2 3 4 5 6i1 2 3 4 5 6j 1 2 3 4 5 6i 0 1 2 3 4 5 6j 1 2 3 4 5 6i Marker Color Code Reconstruction

Figure 5: Details of decoding the markers from Gaussian convolution responses. From left to right: First responses ofGaussian convolutions are stored for each code cell of the marker. Then the response values are normalized accordingto the highest response and then thresholded. After that, cells at each row of the marker are decoded according to thenormalized thresholded responses of on and oﬀ events and decoded value of the cell on the left. Finally, the decodedmarker is decoded and can be reconstructed.convolved with the unwarped images is shown in Figure4(d) where you can see a visualization of the Gaussians onthe left edge of the cells. Although, in theory, the eventsshould appear exactly on the edges of the cell squares onthe marker, in practice, because the on and oﬀ events arenot perfectly synchronized, this might not be the case. Forexample in Figure 4(f) you can see that the on events onthe edge of each square is slightly shifted to the left whilefor oﬀ events (Figure 4(e)) this issue does not exists. Forthis reason, we have to shift the convolution point of theGaussian kernel slightly to the left for oﬀ pixels. We setthe amount of shifting to n d / pixels where n d is the sidelength of each cell square (and also the Gaussian kernel)in pixels. When the convolutions are applied, they resultin values used to determine the color of each cell of themarker. The scores within each cell are shown in Figure4(e) and 4(f).Let us assume that r ij is the response of the convolutionrelated to the inner cell on the i -th row and j -th columnon the marker. We normalize these responses accordingto the maximum response and then threshold them: f ij =  ( r ij max i,j r ij ) /θ  , i, j ∈ { . . . N m } (25)where N m is the size of the code square of the ArUcomarker, θ is the threshold for the cell response, and f ij determines if the events occur on the left edge of the cell.We obtain this value separately for on and oﬀ events,hence, we have both f on ij and f oﬀ ij for each cell at position ( i, j ) . Now we can determine the color code within eachcell of the marker as follows: b i,j =  j = 01 − b i,j − ( b i,j − = 0 ∧ f on ij = 1) ∨ ( b i,j − = 1 ∧ f oﬀ ij = 1) b i,j − otherwise (26)where b i,j , ∀ i = 1 . . . N m , j = 0 . . . N m determines thecolor code at cell ( i, j ) in the marker. In Figure 5 you canﬁnd r i,j , f i,j , and b i,j values for an example of a markerthat is successfully reconstructed. In the last stage, the bi-nary code is extracted from the reconstructed marker andchecked for its validity and its ID according to the markerspeciﬁcations. In the case of ArUco markers, this is doneby concatenating each row of cell codes and looking up theresulting binary number in the marker dictionary. Thisis done four times for each rotation of the reconstructedmarker. If the code is not found in the dictionary we rejectit as a false candidate, otherwise, its ID is extracted fromthe dictionary. To test our algorithm we captured some sequences of mov-ing the event camera in front of ArUco markers. These se-quences were recorded using the iniVation DVS128 eventcamera, with a resolution of × pixels. The eventcamera was attached to a camera rig together with a colorcamera to verify the current scene in the color image as The recording was done in the Interdisciplinary Centre for Se-curity, Reliability and Trust (SnT) at University of Luxembourg vent Camera Color Camera (a) (b) Figure 6: Our experimental setup including (a) the camera rig we employed for capturing our dataset and (b) the × marker grid we used for evaluation of our algorithm. The color and the event camera are ﬁxed on the samerod facing the same direction. The sequences were captured moving the camera rig by hand in left-right and up-downdirections while pointing perpendicularly at the marker grid.well. The color camera was an IDS UI-1220LE-C-HQ witha global shutter and a resolution of × pixels at fps. Figure 6(a) shows our camera rig setup. We ranour algorithm on a laptop with the Core(TM) i7-4700HQCPU and 8 GB of RAM. We implemented our algorithmin C++ and did our experiments under the Ubuntu 18.04operating system. Our implementation only needs a singleCPU core to run. To reduce the noise from the camerato some extent we employed the DVS noise ﬁlter from thelibcaer library .We did both quantitative and qualitative evaluations ofour algorithm using the data we captured from the eventand color camera. To do so we captured sequences byholding the camera close to a grid of markers and movingit in diﬀerent directions with respect to the markers whilepointing perpendicularly at them. The recording was donein an environment with controlled lighting and low ambi-ent light noise. You can see an image of the marker gridin Figure 6(b). Because of the low resolution of the eventcamera, we had to keep the cameras close to the markers,at approximately to cm away from them. In thisway, only one marker was visible at a time in the ﬁeld ofview of both cameras.The captured sequences contain packets in totalwhere each packet contains the events from a ms timeinterval. We aggregated all packets and color frames fromthe sequences together for our quantitative evaluation. Fora minority number of packets ( < ), there were distor-tions in the events because of the bandwidth overﬂow, wediscarded these packets from our evaluations.The parameter values we have employed for our algo-rithm during our experiments are shown in Table 1. We ran our implementation on the captured sequencesand measured the time needed for processing each packet. https://gitlab.com/inivation/dv/libcaer Symbol Value Description W

128 Event image width H

128 Event image height n s I smooth σ s . Gaussian ﬁlter sigma for I smooth l min

25 Minimum line segment size s c

160 Standard marker image size n d

20 Cells size for marker image σ d θ θ (whichhas no unit) is pixels.Then we calculated the average of the durations whichturned out to be . milliseconds per packet on a sin-gle CPU core. This means that our implementation canhandle packets per second which are already higherthan the packets that our camera produces in eachsecond. Hence we were able to run our algorithm in real-time. For further analysis, we also calculated the amountof time needed for diﬀerent steps of the algorithm. Thevalues are presented in Table 2 in milliseconds. As youcan see the most time-consuming part of the algorithmis unwarping the candidates to the standard representa-tion which takes . ± . ms. It also has a relativelyhigh standard deviation value, the reason is that in someframes many marker candidates are formed while in oth-ers, due to less detected line segments, fewer candidatesare generated which creates the variance in time neededfor unwarping. After candidate unwarping, the LSD linesegment detection and candidate formation takes most ofthe time with the average of . ± . ms.8tep of Algorithm Processing Time Per PacketEvent Image Creationand the rest . ± . msSegment Detection andCandidate Formation . ± . msCandidate Unwarping . ± . msMarker Decoding andCode Look-up . ± . msTotal . ± . msTable 2: The average processing time and standard devi-ation for diﬀerent steps of our algorithm and also in totalfor processing each packet (in milliseconds). In our evaluation sequences, we moved the camera rigin side-to-side and up-and-down motions in front of themarker grid. We analyzed the sequences manually to sep-arate the frames (packets) where a marker is completelyvisible in the event camera.We measured two metrics. First, the percentage of timesa marker passes in front of the camera and is detected.And second, the number of frames in which the marker isdetected while visible in the event camera’s frame. Moreprecisely, we deﬁne that a marker “passes” in front of acamera when the whole marker appears in the ﬁeld ofview of the camera and moves in front of it until a partof it exits the ﬁeld of view. If the marker is detected atany time while “passing”, we say that we have a “pass de-tection”. Likewise, frame detections are counted withinframes where the whole marker is within the ﬁeld of viewof the camera.Table 3 presents the results related to pass detectionrate and frame detection rate for the color camera (usingArUco algorithm) and for the event camera (using our al-gorithm). As can be observed our method has a very highpass detection rate ( %) especially compared to that ofArUco’s (3%). The low frame detection and pass detectionrate of ArUco suggests the high sensitivity of the ArUcoalgorithm to motion blur. On the other hand, it can beseen that our approach has detected % of the detectableframes, while ArUco only is able to detect %.We believe that the missed detections of our methodare due to several reasons. First, since the camera rigis moved by hand in a lateral manner, it needs to stopat some points and change direction. In these frames,the movement slows down for a moment and then changesdirection which causes few events to be created, and henceour algorithm would not perform well. Another reason isthat at some frames the camera was not moving parallelenough to the edges of the marker (as it should similar toFigure 2). In these cases, our algorithm could not performcorrectly, however, this is a limitation of the design.In order to make a deeper comparison, we also calcu-lated the frame detection rate only for frames where thewhole marker is in the ﬁeld of view of both cameras. Wealso counted the frames where both ArUco and our algo-rithm detect the marker in the image where there was a AccuracyFrame Detection(%) Pass Detection(%)ArUco Ours ArUco Ours1.58 44.16 3.39 93.63Table 3: Accuracy of detected passes and detected frameswhen the whole marker is in the ﬁeld of view. This iscalculated independently for the color camera using ArUcoalgorithm and the event camera using our algorithm.Accuracy of Frame Detection inCommon Frames (%)Mutual ArUco Ours0.00 1.67 35.34Table 4: Accuracy of detected frames when the marker isin the ﬁeld of view of both cameras.“mutual” detection between the ArUco algorithm and ouralgorithm. The results are shown in Table 4. As you cansee there were no frames that were mutually detected byboth of the cameras. This indicates that our algorithm isa very good complement for the normal ArUco detectionmethod. On the other hand, our detection rate ( %) isstill more than an order of magnitude better than that ofArUco’s ( %) which suggests the extreme sensitivity of theArUco algorithm to motion blur and that our algorithmhandles moving markers much better. For qualitative evaluation, we visualize diﬀerent steps ofour marker detection algorithm for successful and unsuc-cessful cases. First, successful cases for diﬀerent markersare presented in Figure 7. These cases are drawn fromthe sequences used for quantitative evaluation where themarker is detectable in the color camera as well as theevent camera. The picture from the color camera, themarker candidate, normalized thresholded Gaussian re-sponses for on and oﬀ events, and the marker reconstruc-tion are visualized for each case in columns (a) to (e). Themarker is moving horizontally with respect to the camerain rows 1 to 3 and vertically in rows 4 to 6. As can beseen, despite the high frame rate of the color camera ( fps) the images are very blurry. As established in thequantitative results section, in no one of the color imagesof Figure 7, the ArUco algorithm was able to detect themarker due to its high sensitivity to blurred images. Thenormalized thresholded Gaussian responses in columns (c)and (d) make it possible to see how the decoded markeris reconstructed.Four diﬀerent types of movements for which our detec-tion algorithm cannot function properly are shown in Fig-ure 8. In each category you can see the event image for on (red) and oﬀ (blue) events in the ﬁrst row. The smoothedevent image and line segment detections are shown in thesecond and third row separately for on and oﬀ events. Inthe ﬁrst column (a) the camera is zooming out from themarker. As can be observed, there are no line segment923456 (a) (b) (c) (d) (e) Figure 7: Examples of successful detections in our dataset. For each case, the image from the RGB camera (a), thecandidate overlaid on the events (b), Gaussian responses overlaid on unwarped on (c) and oﬀ (d) events, and ﬁnallythe reconstruction of the marker in event camera’s ﬁeld of view (e) is presented. Please note that on and oﬀ eventsin column (b) are shown in red and blue colors respectively.10ll eventsline segments( on events)line segments( oﬀ events) (a) zooming (b) rotating (c) slow movement (d) diagonal movement Figure 8: Cases where marker candidates cannot be formed properly because of unsuitable line segment detection.These include: (a) zooming into or out of the marker, (b) rotating with the center of rotation within the marker, andslow (c) or diagonal (d) movement of the marker.detections in the on events representing any of the markeredges. This is because when the zoom center is withinthe marker all its edges appear to move away/towards thezoom center. Hence marker edges would not be detectablefor either on or oﬀ events and no candidate can be formed.The second column (b) shows a marker rotating with thecenter of rotation within the marker. As can be seen, eachedge of the marker produces both on and oﬀ events in dif-ferent places. This prevents the whole edge to be detectedin either type of event which prevents the formation of acorrect candidate. Column (c) represents the case wherethe camera is moving too slowly. This results in the pro-duction of few events which in turn makes it impossibleto detect marker edges using line segments properly. Fi-nally, in the last column (d) the line segment detectionsfor a marker moving diagonally are shown. Although itmight seem ﬁne in the pictures, the detected line segmentsturn out to be too long for oﬀ events. This is becausethe oﬀ events from two neighboring edges combine whichmakes these edges appear longer. This becomes problem-atic when the marker is unwarped for decoding. This isshown in the second row of Figure 9. The distortion in un-warping the events image in column (b) row 2 is especiallynoticeable if attention is paid to the on events formed onthe top edge. This distortion makes the events move tothe wrong cell and ﬁnally being wrongly decoded as isvisible in the decoded result in column (d) of the secondrow. There is another problem with diagonal movementthat could happen even if the marker candidate is formedproperly. This is visualized in the ﬁrst row of Figure 9. The issue is that when moving diagonally events from cellsbelow can leak into cells above or vice versa. This couldchange the result of Gaussian convolutions and change thedecoded color of a cell. More speciﬁcally the events leakedfrom the lower left or upper left cells can make it appearthat there is a change of color from the left cell to the cur-rent cell and hence produce a wrong color for the currentcell in the decoding process. The aﬀected cells for our ex-ample are indicated with yellow circles in column (b) andthe ﬁrst row of Figure 9. As you can see in column (d)of the ﬁrst row the resulted decoded marker has changedsigniﬁcantly.As mentioned before, it is possible to use a intensityimage reconstruction method to convert the events intointensity image in real-time and then apply a traditionalmarker detection algorithm. However, the good qualitymethods need a powerful dedicated GPU. As the ﬁrst at-tempt to detect and decode binary planar markers onlyusing the CPU unit in real-time, our method has the lim-itations mentioned in this section. Nevertheless, we thinkthat it is possible to alleviate these problems by takinginto account extra parameters such as the motion vectorof the marker. However this is left to be done in futureworks. A method for decoding square binary markers from theoutput of an event camera has been proposed in this pa-per. The algorithm can run in real-time on a single CPU112 (a) (b) (c) (d) (e)

Figure 9: Problems with diagonal movement in marker decoding. Problems with leaking events and candidate edgesthat are too long are shown in the ﬁrst and second row respectively. In each case the marker candidate (a), thresholdednormalized Gaussian responses for on (b) and oﬀ (c) events, the decoded marker (d), and the correct marker (e) arevisualized.core without the need for specialized hardware such asdedicated GPUs. To the best of our knowledge, this isthe ﬁrst attempt to decode ﬁducial planar markers usingevent cameras. An additional contribution of this paperis that the diﬀerent steps we have proposed for processingevents can be helpful in the design of solutions for otherproblems using these cameras.Experimental results show that our method is much su-perior compared to the RGB marker detector in the caseof ArUco for fast-moving markers. This is because theintensity-based methods assume low blur in the imagewhile we demonstrated that even with a high frame ratethere can be a signiﬁcant motion blur. Hence, the pro-posed method is ideal in settings where objects move veryfast in directions roughly parallel or perpendicular to thecamera, such as a production line.On the other hand, for situations other than fast lat-eral (or up/down) movements our algorithm could be im-proved. As future work, we propose to work on these casesincluding diagonal movement, rotations, and zooming ofthe camera. We suggest it might be possible to overcomethese limitation taking into account extra parameters suchas the motion vector. Another matter that could be testedin future work is how robust is the method in diﬀerentlighting conditions and in the presence of ambient lightnoise. Acknowledgments

This project has been funded under projects TIN2019-75279-P and IFI16/00033 (ISCIII) of Spain Ministry ofEconomy, Industry and Competitiveness, and FEDER.In addition, this work was done in collaboration with theAutomation & Robotics research group at the Interdisci-plinary Centre for Security, Reliability and Trust (SnT),University of Luxembourg.

References [1] A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry,C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Gar-reau, M. Mendoza, J. Kusnitz, M. Debole, S. Esser,T. Delbruck, M. Flickner, and D. Modha. A LowPower, Fully Event-Based Gesture Recognition Sys-tem. In , pages 7388–7397,July 2017. ISSN: 1063-6919.[2] Bradley Atcheson, Felix Heide, and Wolfgang Hei-drich. CALTag: High Precision Fiducial Markers forCamera Calibration. In

Vision, Modeling, and Visu-alization (2010) . The Eurographics Association, 2010.Accepted: 2014-02-01T16:18:26Z.[3] Jan Bacik, Frantisek Durovsky, Pavol Fedor, andDaniela Perdukova. Autonomous ﬂying with quadro-copter using fuzzy control and ArUco markers.

Intel-ligent Service Robotics , 10(3):185–194, July 2017.[4] P. Bardow, A. J. Davison, and S. Leutenegger. Simul-taneous Optical Flow and Intensity Estimation froman Event Camera. In , pages884–892, June 2016. ISSN: 1063-6919.[5] S. Barua, Y. Miyatani, and A. Veeraraghavan. Directface detection and video reconstruction from eventcameras. In , pages 1–9,March 2016.[6] Christian Brandli, Lorenz Muller, and Tobi Delbruck.Real-time, high-speed video decompression using aframe- and event-based DAVIS sensor. In , pages 686–689, June 2014. ISSN: 2158-1525.127] J. Conradt, M. Cook, R. Berner, P. Lichtsteiner, R. J.Douglas, and T. Delbruck. A pencil balancing robotusing a pair of AER dynamic vision sensors. In , pages 781–784, May 2009.[8] Joseph DeGol, Timothy Bretl, and Derek Hoiem. Im-proved Structure from Motion Using Fiducial MarkerMatching. pages 273–288, 2018.[9] Ankit Dhall, Kunal Chelani, Vishnu Radhakrishnan,and K. M. Krishna. LiDAR-Camera Calibration using3D-3D Point correspondences. arXiv:1705.09785 [cs] ,May 2017. arXiv: 1705.09785.[10] Richard O. Duda and Peter E. Hart. Use of the Houghtransformation to detect lines and curves in pictures.

Communications of the ACM , 15(1):11–15, January1972.[11] Hadi Fekrmandi, Skye Rutan-Bedard, AlexanderFrye, and Randy Hoover. Validation of Vision-Based State Estimation for Localization of Agentsand Swarm Formation. In Pierre Larochelle andJ. Michael McCarthy, editors,

Proceedings of the 2020USCToMM Symposium on Mechanical Systems andRobotics , Mechanisms and Machine Science, pages216–224, Cham, 2020. Springer International Pub-lishing.[12] M. Fiala. ARTag, a ﬁducial marker system using digi-tal techniques. In , volume 2, pages 590–596 vol. 2, June2005. ISSN: 1063-6919.[13] M. Fiala. Designing Highly Reliable Fiducial Mark-ers.

IEEE Transactions on Pattern Analysis and Ma-chine Intelligence , 32(7):1317–1324, July 2010. Con-ference Name: IEEE Transactions on Pattern Analy-sis and Machine Intelligence.[14] Mark Fiala and Chang Shu. Self-identifying patternsfor plane-based camera calibration.

Machine Visionand Applications , 19(4):209–216, July 2008.[15] Daniel Flohr and Jan Fischer. A Lightweight ID-Based Extension for Marker Tracking Systems. InBernd Froehlich, Roland Blach, and Robert van Liere,editors,

Eurographics Symposium on Virtual Environ-ments, Short Papers and Posters . The EurographicsAssociation, 2007.[16] Guillermo Gallego, Tobi Delbruck, Garrick Orchard,Chiara Bartolozzi, Brian Taba, Andrea Censi, Ste-fan Leutenegger, Andrew Davison, Joerg Conradt,Kostas Daniilidis, and Davide Scaramuzza. Event-based Vision: A Survey. arXiv:1904.08405 [cs] , April2019. arXiv: 1904.08405.[17] S. Gao, G. Guo, H. Huang, X. Cheng, and C. L. P.Chen. An End-to-End Broad Learning System forEvent-Based Object Classiﬁcation.

IEEE Access , 8,2020. Conference Name: IEEE Access. [18] S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez. Automatic gener-ation and detection of highly reliable ﬁducial mark-ers under occlusion.

Pattern Recognition , 47(6):2280–2292, June 2014.[19] S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and R. Medina-Carnicer. Generation of ﬁdu-cial marker dictionaries using Mixed Integer Lin-ear Programming.

Pattern Recognition , 51:481–491,March 2016.[20] Rafael Grompone von Gioi, Jérémie Jakubowicz,Jean-Michel Morel, and Gregory Randall. LSD: aLine Segment Detector.

Image Processing On Line ,2:35–55, March 2012.[21] A. Glover and C. Bartolozzi. Event-driven ball detec-tion and gaze ﬁxation in clutter. In , pages 2203–2208, October 2016.ISSN: 2153-0866.[22] Ondˇrej Holesˇovsky, Vaclav Hlavacˇ, and RomanVıtek. Practical high-speed motion sensing: eventcameras vs. global shutter. In , page 9, Rogaka Slatina, Slovenia,February 2019.[23] Hanme Kim, Ankur Handa, Ryad Benosman, Sio-HoiIeng, and Andrew Davison. Simultaneous Mosaicingand Tracking with an Event Camera. In

Proceed-ings of the British Machine Vision Conference 2014 ,pages 26.1–26.12, Nottingham, 2014. British MachineVision Association.[24] Hanme Kim, Stefan Leutenegger, and Andrew J.Davison. Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera. In BastianLeibe, Jiri Matas, Nicu Sebe, and Max Welling, edi-tors,

Computer Vision – ECCV 2016 , Lecture Notesin Computer Science, pages 349–364, Cham, 2016.Springer International Publishing.[25] P. Lichtsteiner, C. Posch, and T. Delbruck. A 128X 128 120db 30mw asynchronous vision sensor thatresponds to relative intensity change. In , pages 2060–2069, February 2006.ISSN: 2376-8606.[26] M. Litzenberger, C. Posch, D. Bauer, A. N. Bel-bachir, P. Schon, B. Kohn, and H. Garn. EmbeddedVision System for Real-Time Object Tracking usingan Asynchronous Transient Vision Sensor. In , pages173–178, September 2006.[27] A. I. Maqueda, A. Loquercio, G. Gallego, N. Gar-cía, and D. Scaramuzza. Event-Based Vision MeetsDeep Learning on Steering Prediction for Self-DrivingCars. In , pages 5419–5427,June 2018. ISSN: 2575-7075.1328] A. Mitrokhin, C. Fermüller, Chethan Parameshwara,and Y. Aloimonos. Event-Based Moving ObjectDetection and Tracking. , 2018.[29] Gottfried Munda, Christian Reinbacher, and ThomasPock. Real-Time Intensity-Image Reconstruction forEvent Cameras Using Manifold Regularisation.

Inter-national Journal of Computer Vision , 126(12):1381–1393, December 2018.[30] Rafael Muñoz-Salinas, Manuel J. Marín-Jimenez, andR. Medina-Carnicer. SPM-SLAM: Simultaneous lo-calization and mapping with squared planar markers.

Pattern Recognition , 86:156–171, February 2019.[31] Rafael Muñoz-Salinas, Manuel J. Marín-Jimenez, En-rique Yeguas-Bolivar, and R. Medina-Carnicer. Map-ping and localization from planar markers.

PatternRecognition , 73:158–171, January 2018.[32] Rafael Muñoz-Salinas and R. Medina-Carnicer.UcoSLAM: Simultaneous localization and mappingby fusion of keypoints and squared planar markers.

Pattern Recognition , 101:107193, May 2020.[33] J. Nagata, Y. Sekikawa, K. Hara, T. Suzuki, andA. Yoshimitsu. QR-code Reconstruction from EventData via Optimization in Code Subspace. In , pages 2113–2121, March 2020.ISSN: 2642-9381.[34] E. Piątkowska, A. N. Belbachir, S. Schraml, andM. Gelautz. Spatiotemporal multiple persons track-ing using Dynamic Vision Sensor. In , pages 35–40,June 2012.[35] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza.Events-To-Video: Bringing Modern Computer Vi-sion to Event Cameras. In , pages 3852–3861, June 2019. ISSN: 2575-7075.[36] J. Rekimoto. Matrix: a realtime object identiﬁca-tion and registration method for augmented reality.In

Proceedings. 3rd Asia Paciﬁc Computer HumanInteraction (Cat. No.98EX110) , pages 63–68, July1998.[37] M. Rumpler, S. Daftry, A. Tscharf, R. Prettenthaler,C. Hoppe, G. Mayer, and H. Bischof. AutomatedEnd-to-End Workﬂow for Precise and Geo-accurateReconstructions using Fiducial Markers. In

ISPRSAnnals of Photogrammetry, Remote Sensing and Spa-tial Information Sciences , volume II-3, pages 135–142. Copernicus GmbH, August 2014. ISSN: 2194-9042. [38] M. F. Sani and G. Karimian. Automatic navigationand landing of an indoor AR. drone quadrotor usingArUco marker and inertial sensors. In , pages 102–107, November 2017.[39] Sergey. Sannikov, Fedor Zhdanov, Pavel Chebotarev,and Pavel Rabinovich. Interactive Educational Con-tent Based on Augmented Reality and 3D Visualiza-tion.

Procedia Computer Science , 66:720–729, Jan-uary 2015.[40] Hamid Sarmadi, Rafael Muñoz-Salinas, M. A. Berbís,and R. Medina-Carnicer. Simultaneous Multi-ViewCamera Pose Estimation and Object Tracking WithSquared Planar Markers.

IEEE Access , 7:22927–22940, 2019. Conference Name: IEEE Access.[41] Hamid Sarmadi, Rafael Muñoz-Salinas, M. ÁlvaroBerbís, Antonio Luna, and R. Medina-Carnicer. 3DReconstruction and alignment by consumer RGB-Dsensors and ﬁducial planar markers for patient posi-tioning in radiation therapy.

Computer Methods andPrograms in Biomedicine , 180:105004, October 2019.[42] Hamid Sarmadi, Rafael Muñoz-Salinas, M. ÁlvaroBerbís, Antonio Luna, and R. Medina-Carnicer. JointScene and Object Tracking for Cost-Eﬀective Aug-mented Reality Assisted Patient Positioning in Radi-ation Therapy. arXiv:2010.01895 [cs] , October 2020.arXiv: 2010.01895.[43] S. Schraml, A. N. Belbachir, N. Milosevic, andP. Schön. Dynamic stereo vision system for real-time tracking. In

Proceedings of 2010 IEEE Inter-national Symposium on Circuits and Systems , pages1409–1412, May 2010. ISSN: 2158-1525.[44] B. Son, Y. Suh, S. Kim, H. Jung, J. Kim, C. Shin,K. Park, K. Lee, J. Park, J. Woo, Y. Roh, H. Lee,Y. Wang, I. Ovsiannikov, and H. Ryu. 4.1 A 640×480dynamic vision sensor with a 9µm pixel and 300Mepsaddress-event representation. In ,pages 66–67, February 2017. ISSN: 2376-8606.[45] H. Su, C. Yang, G. Ferrigno, and E. De Momi. Im-proved Human–Robot Collaborative Control of Re-dundant Robot for Teleoperated Minimally InvasiveSurgery.

IEEE Robotics and Automation Letters ,4(2):1447–1453, April 2019. Conference Name: IEEERobotics and Automation Letters.[46] Ryo Suzuki, Clement Zheng, Yasuaki Kakehi, TomYeh, Ellen Yi-Luen Do, Mark D. Gross, and DanielLeithinger. ShapeBots: Shape-changing SwarmRobots. In

Proceedings of the 32nd Annual ACMSymposium on User Interface Software and Technol-ogy , UIST ’19, pages 493–505, New York, NY, USA,October 2019. Association for Computing Machinery.[47] Leila Besharati Tabrizi and Mehran Mahvash. Aug-mented reality–guided neurosurgery: accuracy andintraoperative application of an image projection14echnique.

Journal of Neurosurgery , 123(1):206–211,July 2015. Publisher: American Association of Neu-rological Surgeons Section: Journal of Neurosurgery.[48] D. Reverter Valeiras, X. Lagorce, X. Clady, C. Bar-tolozzi, S. Ieng, and R. Benosman. An Asyn-chronous Neuromorphic Event-Driven Visual Part-Based Shape Tracking.

IEEE Transactions on NeuralNetworks and Learning Systems , 26(12):3045–3059,December 2015. Conference Name: IEEE Transac-tions on Neural Networks and Learning Systems.[49] V. Vasco, A. Glover, and C. Bartolozzi. Fast event-based Harris corner detection exploiting the advan-tages of event-driven cameras. In , pages 4144–4149, October 2016. ISSN:2153-0866.[50] Daniel Wagner and Dieter Schmalstieg. ARToolKit-Plus for Pose Tracking on Mobile Devices. In

Com-puter Vision Winter Workshop , January 2007.[51] Xiang Zhang, S. Fronz, and N. Navab. Visual markerdetection and decoding in AR systems: a comparativestudy. In

Proceedings. International Symposium onMixed and Augmented Reality , pages 97–106, October2002.[52] L. Zhang, H. Zhang, J. Chen, and L. Wang. Hy-brid Deblur Net: Deep Non-Uniform Deblurring WithEvent Camera.