[PDF] Various methods for queue length and traffic volume estimation using probe vehicle trajectories

Abstract

The rapid development of connected vehicle technology and the emergence of ride-hailing services have enabled the collection of a tremendous amount of probe vehicle trajectory data. Due to the large scale, the trajectory data have become a potential substitute for the widely used fixed-location sensors in terms of the performance measures of transportation networks. Specifically, for traffic volume and queue length estimation, most of the trajectory data based methods in the existing literature either require high market penetration of the probe vehicles to identify the shockwave or require the prior information about the queue length distribution and the penetration rate, which may not be feasible in the real world. To overcome the limitations of the existing methods, this paper proposes a series of novel methods based on probability theory. By exploiting the stopping positions of the probe vehicles in the queues, the proposed methods try to establish and solve a single-variable equation for the penetration rate of the probe vehicles. Once the penetration rate is obtained, it can be used to project the total queue length and the total traffic volume. The validation results using both simulation data and real-world data show that the methods would be accurate enough for assistance in performance measures and traffic signal control at intersections, even when the penetration rate of the probe vehicles is very low.

Full PDF

VVarious methods for queue length and traﬃc volumeestimation using probe vehicle tra jectories ∗ Yan Zhao a , Jianfeng Zheng b, † , Wai Wong c , Xingmin Wang c , Yuan Meng b , Henry X. Liu b,c,d a Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA b Didi Chuxing Inc., Beijing, China c Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, USA d University of Michigan Transportation Research Institute, University of Michigan, Ann Arbor, MI, USA

Abstract

The rapid development of connected vehicle technology and the emergence of ride-hailing services have en-abled the collection of a tremendous amount of probe vehicle trajectory data. Due to the large scale, thetrajectory data have become a potential substitute for the widely used ﬁxed-location sensors in terms ofthe performance measures of transportation networks. Speciﬁcally, for traﬃc volume and queue length es-timation, most of the trajectory data based methods in the existing literature either require high marketpenetration of the probe vehicles to identify the shockwave or require the prior information about the queuelength distribution and the penetration rate, which may not be feasible in the real world. To overcomethe limitations of the existing methods, this paper proposes a series of novel methods based on probabilitytheory. By exploiting the stopping positions of the probe vehicles in the queues, the proposed methods tryto establish and solve a single-variable equation for the penetration rate of the probe vehicles. Once thepenetration rate is obtained, it can be used to project the total queue length and the total traﬃc volume.The validation results using both simulation data and real-world data show that the methods would beaccurate enough for assistance in performance measures and traﬃc signal control at intersections, even whenthe penetration rate of the probe vehicles is very low.

Keywords : Probe vehicle, Queue length estimation, Penetration rate, Traﬃc volume estimation ∗ c (cid:13) † Corresponding author, email: [email protected]. a r X i v : . [ phy s i c s . s o c - ph ] A ug Introduction and Motivation

Traﬃc volumes and queue lengths are important performance measures for signalized intersections. Con-ventional approaches for traﬃc volume measurement and queue length estimation are primarily based onﬁxed-location sensors, such as loop detectors (Liu et al., 2009; Lee et al., 2015; An et al., 2018). However,the installation and maintenance of the ﬁxed-location sensors are very costly, calling for urgent needs of newalternatives of data sources. This gap can be now fulﬁlled, thanks to the rapid development of connectedvehicle technology and the emergence of ride-hailing services. The global positioning system (GPS) deviceson the connected vehicles or the smartphones in the ride-hailing vehicles could record the trajectories ofthese probe vehicles, providing rich information about the traﬃc conditions in transportation networks.Based on the probe vehicle trajectory data, a wide range of methods have been proposed for estimating thequeue lengths and traﬃc volumes at the signalized intersections (Guo et al., 2019). A stream of literaturesolves the problem from the perspective of probability theory and statistics. Comert and Cetin (2009)showed that given the penetration rate of the probe vehicles and the distribution of queue lengths, thepositions of the last probe vehicles in the queues alone would be suﬃcient for cycle-by-cycle queue lengthestimation. Comert and Cetin (2009) also analyzed the relationship between the probe vehicle marketpenetration ratio and estimation accuracy. Comert and Cetin (2011) extended their work to both spatialand temporal dimensions by considering the time when the probe vehicles joined the queues. In 2013,Comert studied the eﬀect of the data from stop line detection (Comert, 2013a) and proposed another simpleanalytical model (Comert, 2013b). Li et al. (2013) formulated the dynamics of the queue length as astate transition process and employed a Kalman ﬁlter to estimate the queue length cycle by cycle. Withthe assumption of Poisson distribution, Comert (2016) summarized a series of methods of queue lengthestimation and penetration rate estimation and evaluated the estimators systematically. As for traﬃc volumeestimation, Zheng and Liu (2017) applied maximum likelihood estimation, assuming the vehicle arrivals atthe intersections follow a time-varying Poisson process. The model was validated using the trajectory datacollected from connected vehicles and taxis. Zhan et al. (2017) studied citywide traﬃc volume estimationusing large-scale trajectory data, by combining some machine learning techniques and the traditional traﬃcﬂow theory. Wang et al. (2019) constructed a three-layer Bayesian network to capture the relationshipbetween vehicle arrival processes and the timing information in probe vehicle trajectory data. The averagearrival rate was inferred from the Bayesian network by applying the Expectation-Maximization algorithm.There is also a stream of literature that applies the shockwave theory to probe vehicle data (Ban et al., 2011;Cetin, 2012; Hao et al., 2015; Hao and Ban, 2015; Ramezani and Geroliminis, 2015; Li et al., 2017; Rompiset al., 2018), or combines probe vehicle data and loop detector data (Badillo et al., 2012; Cai et al., 2014;Wang et al., 2017; Shahrbabaki et al., 2018), to estimate or predict the queue lengths. Since these studiesare not closely related to this paper methodologically, they will not be introduced in detail.Most of the existing literature introduced above on queue length estimation focuses on cycle-by-cycleestimation and requires the prior information about the penetration rate of the probe vehicles and thedistribution of queue lengths. However, the prior information is usually not available. Although a recentstudy by Wong et al. (2019a) proposed a novel method that provides an unbiased estimator for the probevehicle penetration rate solely based on probe vehicle trajectory data, the method cannot handle the caseswhen some of the queues are empty. As for traﬃc volume estimation, the model developed by Zheng andLiu (2017) assumes the vehicle arrivals in each cycle follow a time-varying Poisson process, which might notbe reasonable in over-saturation cases when the arrival process, the queueing process, and the departureprocess are all diﬀerent. Although the method proposed by Zhan et al. (2017) can be applied in large scale,it requires the ground-truth traﬃc volume data on some road segments to build a connection between theirhigh-level features and the actual volume categories, which implies that the method depends on not only thetrajectory data but also other sources of data. Zhao et al. (2019) proposed a simpliﬁed method of ﬁndingthe penetration rate of the probe vehicles based on Bayes’ theorem. Extending the method in Zhao et al.(2019), this paper aims to propose a general framework and a series of methods that can estimate queuelength and traﬃc volume both accurately and eﬃciently.Estimating the states of the whole population from a small portion of it (Wong and Wong, 2015, 2016a),in nature, has to build a connection between the small portion and the whole population by their commonfeatures. When the traﬃc is ﬂowing, it is diﬃcult to infer how many regular vehicles are around the probevehicles. Consequently, it is almost impossible to estimate the penetration rate of the probe vehicles in2he traﬃc. However, when the vehicles are stopping at the intersections, because the empirical value ofthe space headway is usually around 7.5 m/veh, the number of vehicles in front of the last probe vehiclecan be roughly inferred. Although the number of vehicles behind the last probe vehicle is still unknown,the incomplete information could still provide an opportunity to estimate the penetration rate of the probevehicles. According to the penetration rate, the total queue length and the total traﬃc volume can beprojected by scaling up the number of probe vehicles in the queues and in the traﬃc, respectively (Wongand Wong, 2016b; Wong et al., 2019b). The proposed methods in this paper take the stopping positions atthe intersections as the common characteristics between the probe vehicles and the regular vehicles. Sincethe proposed methods in this paper have few external dependencies, they could overcome the limitations ofthe existing methods and be applied to a broader range of scenarios. The methods have been validated byboth simulation and large-scale real-world data, showing good accuracy.The rest of this paper is organized as follows. In Section 2, a detailed description of the problem will begiven. Depending upon the existence of the probe vehicles, the queues over diﬀerent cycles will be categorizedinto two classes: the observable queues (with probe vehicles) and the hidden queues (without probe vehicles).It will also be shown that the total traﬃc volume and the total queue length can be easily obtained once theprobe vehicle penetration rate is known. Section 3 will present four diﬀerent estimators of the total lengthof the observable queues. Section 4 will present two diﬀerent estimators of the total length of the hiddenqueues. In Section 5, two methods for estimating the penetration rate of the probe vehicles will be proposed,which combine the various estimators presented in Section 3 and Section 4. The proposed methods arevalidated and evaluated in Section 6. Finally, there will be some concluding remarks in Section 7.

When the vehicles are stopping at the intersections due to the traﬃc lights, some vehicles in the queue mightbe the probe vehicles of which the trajectories could be recorded by the onboard GPS devices. For a speciﬁcmovement and a speciﬁc time slot, the vehicle arrival process is assumed to be stationary; the probe vehiclesare assumed to be homogeneously mixed with other vehicles. Let p denote the penetration rate of the probevehicles, that is, when arbitrarily selecting a vehicle from the queue, its probability of being a probe vehicleis p , where p ∈ (0 , C cycles. In each cycle, the positionsof the probe vehicles in the queue can be easily extracted from the trajectory data. The average spaceheadway when vehicles are stopping at the intersections is assumed to be known empirically, which is acommon assumption in the relevant literature. Then, with the knowledge of the position of the stop bar, thenumber of vehicles in front of the last probe vehicle can also be inferred, although the number of vehiclesbehind the last probe vehicle is still unknown. Denote the queue length in the i th cycle by a random variable Q i , ∀ i ∈ { , , . . . , C } . Denote the number of probe vehicles in the i th cycle by a random variable N i . Denotethe observed partial queue in the i th cycle by a tuple q i consisting of “0”s and “1”s which represent regularvehicles and probe vehicles, respectively. Denote the length of the observed partial queue by | q i | . Apparently, Q i ≥ | q i | ≥ N i , ∀ i ∈ { , , . . . , C } .Figure 1 illustrates what can be easily inferred from the trajectory data. Q , Q , Q , Q , Q , and Q are (partially) observable because of the probe vehicles in the queues. Q , Q , and Q are hidden becausethere are no probe vehicles. Denote the total length of the observable queues and the total length of thehidden queues by Q obs and Q hid , respectively. In Figure 1, Q obs = Q + Q + Q + Q + Q + Q = 30 and Q hid = Q + Q + Q = 7. In the i th cycle, if the queue is observable, then denote the positions of the ﬁrstand the last probe vehicles by S i and T i , respectively.Deﬁne a binary random variable X li to indicate if the queue length in the i th cycle is l , that is, X li = (cid:40) , Q i = l , Q i (cid:54) = l , (1)where l ∈ { , , . . . , L max } and L max is an upper bound of the queue length. Denote the number of queuesof length l in all the cycles by C l . Obviously, C = (cid:80) L max l =0 C l and C l = (cid:80) Ci =1 X li .To estimate the traﬃc volume and the total queue length for a speciﬁc movement and a speciﬁc timeslot, the key step is to ﬁnd the penetration rate p . Denote the total number of probe vehicles in the queues3igure 1: Observation processby Q probe and denote the traﬃc volume of probe vehicles by V probe . Since Q probe and V probe can be easilyobtained from the trajectory data by counting the number of probe vehicles in the queues and in the traﬃcﬂows, once p is known, equation (2) and equation (3) can give an estimate of the total queue length and thetotal traﬃc volume, respectively (Wong et al., 2019b; Wong and Wong, 2019; Zhao et al., 2019).ˆ Q all = Q probe p . (2)ˆ V all = V probe p . (3)Table 1 summarizes the notations deﬁned above.Table 1: NotationsNotation Description C The total number of cycles Q i The queue length in the i th cycle q i The observed partial queue in the i th cycle N i The number of probe vehicles in the i th cycle S i The position of the ﬁrst probe vehicle in the i th cycle T i The position of the last probe vehicle in the i th cycle X li A binary variable to indicate if the queue length in the i th cycle is lC l The total number of queues of length lL max An upper bound of the queue length Q obs The total length of all the (partially) observable queues Q hid The total length of the hidden queues Q probe The total number of probe vehicles in all the queues V probe The traﬃc volume of probe vehicles Q obsQ obs can be estimated through two approaches. Estimator 1, 2, and 3 are based on the fact that the probevehicles are expected to segregate the regular vehicles equally. These estimators only require the numberof stopping probe vehicles in each cycle and the stopping positions of the ﬁrst and the last probe vehiclesin the queues, all of which can be easily extracted from the trajectory data. Therefore, the estimators are4onstant values. By contrast, estimator 4 is based on Bayes’ theorem, which relies on the penetration rate p . Thus, estimator 4 is a function of p . Theorem n i ≥

1, given that N i = n i in the i th cycle, E ( Q i | N i = n i ) = E ( S i | N i = n i )( n i + 1) − . (4)The proof is in Appendix A.Theorem 1 states that given the number of probe vehicles in an observable queue, the expected queuelength can be obtained from the expected stopping position of the ﬁrst probe vehicle. Based on Theorem 1,given the number of probe vehicles in each cycle, the expected total length of the observable queues can beexpressed as (cid:88) i : n i (cid:54) =0 E ( Q i | N i = n i ) = (cid:88) i : n i (cid:54) =0 ( E ( S i | N i = n i ) ( n i + 1) − . (5)= (cid:88) i : n i (cid:54) =0 E ( S i | N i = n i ) ( n i + 1) − (cid:88) i : n i (cid:54) =0 L max (cid:88) j =1 (cid:88) i : n i = j E ( S i | N i = j ) ( j + 1) − (cid:88) i : n i (cid:54) =0 L max (cid:88) j =1 ( j + 1) (cid:88) i : n i = j E ( S i | N i = j ) − (cid:88) i : n i (cid:54) =0 . (8)Therefore, given the position of the ﬁrst stopping probe vehicle S i = s i in the i th cycle, ∀ i ∈ { , , , . . . , C } , bysubstituting the sample mean (cid:80) i : ni = j s i (cid:80) i : ni = j for the expected value E ( S i | N i = j ) , ∀ j ≥ Q obs can be estimatedby ˆ Q obs = L max (cid:88) j =1 ( j + 1) (cid:88) i : n i = j s i − (cid:88) i : n i (cid:54) =0 L max (cid:88) j =1 (cid:88) i : n i = j s i ( j + 1) − (cid:88) i : n i (cid:54) =0 (cid:88) i : n i (cid:54) =0 s i ( n i + 1) − (cid:88) i : n i (cid:54) =0 (cid:88) i : n i (cid:54) =0 ( s i ( n i + 1) − . (12) Theorem n i ≥

1, given that N i = n i in the i th cycle, E ( Q i | N i = n i ) = E ( T i | N i = n i ) n i + 1 n i − . (13)The proof is in Appendix A.Theorem 2 states that given the number of probe vehicles in an observable queue, the expected queuelength can be obtained from the expected stopping position of the last probe vehicle. Based on Theorem5, given the number of probe vehicles in each cycle, the expected total length of observable queues can beexpressed as (cid:88) i : n i (cid:54) =0 E ( Q i | N i = n i ) = (cid:88) i : n i (cid:54) =0 (cid:18) E ( T i | N i = n i ) n i + 1 n i − (cid:19) . (14)Following the similar derivations with estimator 1, given the position of the last stopping probe vehicle T i = t i in the i th cycle, ∀ i ∈ { , , . . . , C } , by substituting the sample mean (cid:80) i : ni = j t i (cid:80) i : ni = j for the expected value E ( T i | N i = j ) , ∀ j ≥ Q obs can be estimated byˆ Q obs = (cid:88) i : n i (cid:54) =0 (cid:18) t i n i + 1 n i − (cid:19) . (15) Theorem n i ≥

1, given that N i = n i in the i th cycle, E ( Q i | N i = n i ) = E ( S i | N i = n i ) + E ( T i | N i = n i ) − , (16) E ( Q i | N i ≥

1) = E ( S i | N i ≥

1) + E ( T i | N i ≥ − . (17)The proof is in Appendix A.Theorem 3 states that given the number of probe vehicles in an observable queue, the expected queuelength can be obtained from the expected stopping positions of the ﬁrst and the last probe vehicles. Basedon Theorem 3, given the number of probe vehicles in each cycle, the expected total length of the observablequeues can be expressed as (cid:88) i : n i (cid:54) =0 E ( Q i | N i = n i ) = (cid:88) i : n i (cid:54) =0 ( E ( S i | N i = n i ) + E ( T i | N i = n i ) − . (18)Therefore, by substituting the sample means (cid:80) i : ni = j s i (cid:80) i : ni = j and (cid:80) i : ni = j t i (cid:80) i : ni = j for the expected values E ( S i | N i = n i )and E ( T i | N i = n i ) , ∀ j ≥

1, respectively, Q obs can be estimated byˆ Q obs = (cid:88) i : n i (cid:54) =0 ( s i + t i − . (19)The mechanism behind ˆ Q obs is intuitive. Take Figure 2 for example. The queue in the k th cycle isthe reverse of the queue in the j th cycle, which implies that the number of vehicles behind the last probevehicle in the j th cycle is equal to the number of vehicles in front of the ﬁrst probe vehicle in the k thcycle. Because of the symmetry, these two queues have the same probability of occurring. Therefore, eventhough the number of vehicles behind the last probe vehicle in a cycle is unknown, as long as the samplesize is suﬃcient, the missing number could be compensated by the number of vehicles in front of the ﬁrstprobe vehicle in another cycle. Essentially, ˆ Q obs is obtained by summing up the position of the last probevehicle t i and the number of vehicles in front of the ﬁrst probe vehicle s i −

1, which could be regarded as acompensation of the missing vehicles in the rear.Figure 2: The missing information compensated by another queue6 .4 Estimator 4 based on Bayes’ theorem

Given all the observed partial queues, as derived in Zhao et al. (2019), the conditional expectation of thetotal length of the observable queues can be expressed as (cid:88) i : n i (cid:54) =0 E ( Q i | q i ) = (cid:88) i : n i (cid:54) =0 L max (cid:88) l =1 P ( Q i = l ) P ( q i | Q i = l ) (cid:80) L max j =0 P ( Q i = j ) P ( q i | Q i = j ) l (20)= (cid:88) i : n i (cid:54) =0 L max (cid:88) l = | q i | E ( C l ) p n i (1 − p ) l − n i (cid:80) L max j = | q i | E ( C j ) p n i (1 − p ) j − n i l (21)= (cid:88) i : n i (cid:54) =0 L max (cid:88) l = | q i | p E ( C l ) (cid:80) L max j = | q i | p E ( C j ) (1 − p ) j − l l. (22) C l , the number of cycles with queues of length l , equals to the diﬀerence between the count of stoppingvehicles at position l + 1 and the count of stopping vehicles at position l , as illustrated by the ﬁrst twodiagrams in Figure 3. Since the probe vehicles are assumed to be homogeneously mixed with other vehicles,the histogram of the stopping positions of the probe vehicles is a p scaled-down version of the histogramof the stopping positions of all the vehicles. Therefore, ˆ C l , the diﬀerence between ¯ c l , the count of stoppingprobe vehicles at position l + 1, and ¯ c l +1 , the count of stopping probe vehicles at position l , can be used toapproximate p E ( C l ). When the diﬀerence is negative, a least-squares method can be applied to ensure thenonnegativity of ˆ C l (Zhao et al., 2019).Figure 3: The relationship between the distributions of queue lengths and stopping positionsOnce ˆ C l is obtained, replacing p E ( C l ) in equation (22) by its approximation ˆ C l gives an estimate of Q obs ˆ Q obs ( p ) = (cid:88) i : n i (cid:54) =0 L max (cid:88) l = | q i | ˆ C l (cid:80) L max j = | q i | ˆ C j (1 − p ) j − l l, (23)which is a function of the penetration rate p . Q hid After estimating Q obs , the following question is how to estimate Q hid , as there is no probe vehicle in thecorresponding cycles. Fortunately, the fact that no probe vehicle is in the queues also contains information.In this section, two estimators of Q hid will be presented. Similar to ˆ Q obs ( p ), estimator 1 of Q hid appliesBayes’ theorem to the hidden queues directly. Estimator 2 utilizes the ratio between the probability of beingobservable and the probability of being hidden for each queue, to estimate the total length of the hiddenqueues. 7 .1 Estimator 1 based on Bayes’ theorem Similar to equation (22), given the fact that no probe vehicle is observed in the hidden queues, the expectedtotal length of the hidden queues can be expressed as (cid:88) i : n i =0 E ( Q i | q i ) = (cid:88) i : n i =0 L max (cid:88) l =0 P ( Q i = l ) P ( q i | Q i = l ) (cid:80) L max j =0 P ( Q i = l ) P ( q i | Q i = j ) l. (24)= (cid:88) i : n i (cid:54) =0 L max (cid:88) l =0 p E ( C l ) (cid:80) L max j =0 p E ( C j ) (1 − p ) j − l l. (25)Therefore, an estimator of Q hid can be given byˆ Q hid ( p ) = (cid:88) i : n i =0 L max (cid:88) l =0 ˆ C l (cid:80) L max j =0 ˆ C j (1 − p ) j − l l. (26)Please note that diﬀerent from equation (23), the summation over l in equation (26) starts from 0, becausewhen q i is an empty tuple, P ( q i | Q i = l ) = (1 − p ) l . (27)Here shows how to ﬁnd ˆ C , an estimate of p E ( C ).In all the queues, the expected counts of queues of length 0 is E ( C ) = C − L max (cid:88) l =1 E ( C l ) . (28)Therefore, multiplying p on the two sides of the equation gives p E ( C ) = pC − L max (cid:88) l =1 p E ( C l ) . (29)ˆ C , an estimate of p E ( C ), can be easily given byˆ C = pC − L max (cid:88) l =1 ˆ C l . (30)All the parameters except p on the right-hand side of equation (26) can be calculated, therefore, ˆ Q hid ( p ) isa function of only p . If stop line detection (such as loop detectors) data are available, ˆ C can be more easilyobtained. Among the observable queues, ∀ l ∈ { , , . . . , L max } , the expected counts of queues of length l can beexpressed as (cid:88) i : n i (cid:54) =0 E (cid:0) X li | q i (cid:1) = (cid:88) i : n i (cid:54) =0 (cid:0) P ( X li = 1 | q i ) · P ( X li = 0 | q i ) · (cid:1) (31)= (cid:88) i : n i (cid:54) =0 ( P ( Q i = l | q i ) · P ( Q i (cid:54) = l | q i ) ·

0) (32)= (cid:88) i : n i (cid:54) =0 P ( Q i = l | q i ) . (33)8or a queue of length l , the probability of being hidden (without any probe vehicle) is (1 − p ) l ; the probabilityof being observed (with at least one probe vehicle) is 1 − (1 − p ) l . Therefore, the expected total length ofthe hidden queues can be estimated by L max (cid:88) l =1  (1 − p ) l − (1 − p ) l (cid:88) i : n i (cid:54) =0 E (cid:0) X li | q i (cid:1) l = L max (cid:88) l =1 (1 − p ) l − (1 − p ) l (cid:88) i : n i (cid:54) =0 P ( Q i = l | q i ) l (34)= (cid:88) i : n i (cid:54) =0 L max (cid:88) l =1 (1 − p ) l − (1 − p ) l P ( Q i = l ) P ( q i | Q i = l ) (cid:80) L max j =0 P ( Q i = j ) P ( q i | Q i = j ) l (35)= (cid:88) i : n i (cid:54) =0 L max (cid:88) l = | q i | (1 − p ) l − (1 − p ) l p E ( C l ) (cid:80) L max j = | q i | p E ( C j ) (1 − p ) j − l l. (36)Then, an estimator of Q hid , the total length of the hidden queues, can be deﬁned asˆ Q hid ( p ) = (cid:88) i : n i (cid:54) =0 L max (cid:88) l = | q i | (1 − p ) l − (1 − p ) l ˆ C l (cid:80) L max j = | q i | ˆ C j (1 − p ) j − l l. (37) In this section, two diﬀerent methods for penetration rate estimation will be presented. The methodologyis to establish an equation with only a single unknown variable p using the estimators developed in theprevious sections. Then, an estimate of p can be obtained by solving the equation. Method 1 is basedupon the equivalence between the diﬀerent estimators. Method 2 exploits the fact that the portion of probevehicles in the queues is approximately equal to the penetration rate. When estimating Q obs , estimator 1, 2, and 3 can generate constant results, whereas estimator 4 is a functionof p . Since the four estimators are of the same variable Q obs , it is intuitive to establish the followingsingle-variable equation ˆ Q obsi = ˆ Q obs ( p ) , ∀ i = 1 , , . (38)Solving the equation will yield an estimate of the penetration rate p . Similarly, when estimating Q hid , bothestimator 1 and estimator 2 are functions of p . Therefore, another single-variable equation can be given byˆ Q hid ( p ) = ˆ Q hid ( p ) . (39)A more general formulation of this method can be expressed as follows.ˆ Q obsi ( p ) + ˆ Q hidj ( p ) = ˆ Q obsm ( p ) + ˆ Q hidn ( p ) . (40)As long as it is an equation with a single unknown variable p , solving it will give an estimate of the penetrationrate. Both the left-hand side and the right-hand side of equation (40) can be regarded as estimators of thetotal queue length. Another way to establish a single-variable equation for p is shown by equation (41). Q probe ˆ Q obsi ( p ) + ˆ Q hidj ( p ) = p, ∀ i = 1 , , , , ∀ j = 1 , , (41)The left-hand side of equation (41) could be interpreted as an estimate of the portion of probe vehicles in thequeues. The right-hand side is the penetration rate which should be approximately equal to the left-handside. Similarly, solving this equation yields an estimate of p .9n practice, it is usually hard to ﬁnd p by solving equation (38), (39), (40), or (41) directly. Instead, aniterative algorithm should be applied. One may search p from an upper bound to 0 with a small step sizeuntil the diﬀerence between the left-hand side and the right-hand side reaches certain stopping criteria. Theupper bound can be taken as Q probe (cid:80) i | q i | since it is an overestimate of the penetration rate p .Once p is estimated, equation (2) and equation (3) can be used to estimate the total queue length andthe total traﬃc volume, respectively. The focus of this test is on the estimation of penetration rate and queue length. Unlike the existing methods(Comert and Cetin, 2009; Comert, 2016; Zheng and Liu, 2017), the proposed methods in this paper do notrequire the prior information about the penetration rate and the queue length distribution. For demon-stration purposes, the testing dataset is generated by a simulation of Poisson processes, although any otherstochastic process can also be applied. The penetration rate of the probe vehicles is enumerated from 0.01to 0.99 with a step size of 0.01 in each test, in order to test the robustness of the proposed methods.

Figure 4 shows the results of penetration rate estimation using six diﬀerent submethods introduced inSection 5. The simulation data are generated by a Poisson process with an average arrival rate during thered phase λ = 10 for 1,000 cycles. The horizontal axes represent the ground truth of the penetration rates.The vertical axes represent the estimated values. The used measure of the estimation accuracy is the meanabsolute percentage error (MAPE). As Figure 4 shows, the dots in blue are very close to the diagonals, whichimplies that the methods can estimate the penetration rate very accurately. Figure 5 shows the results ofqueue length estimation using the diﬀerent submethods. The horizontal axes represent the penetration rates,and the vertical axes represent the estimated average queue lengths. The results show that the higher thepenetration rate is, the better the estimation results tend to be. It is intuitive because when the penetrationrate is very low, only a tiny portion of vehicles can be observed. By contrast, if the penetration rate is veryclose to 100%, there will be little missing information and the estimation results would be more accurate.In general, method 2 outperforms method 1. To better understand the mechanism behind method 2,deﬁne an inverse proportional function f ( x ) = Mx , where M is a positive constant. When x (cid:29) √ M , the absolute value of the derivative is | f (cid:48) ( x ) | = Mx (cid:28) Q obsi ( p ) + ˆ Q hidj ( p ), which ismuch larger than (cid:112) Q probe . Therefore, due to the property of the inverse proportional function, the error inˆ Q obsi ( p ) + ˆ Q hidj ( p ) only results in an error of p which is orders of magnitude smaller. That is why method 2generally outperforms method 1.Among the estimators of Q obs , ˆ Q obs scales up the stopping positions of the ﬁrst probe vehicle by arelatively large scaling factor ( n i +1) in each cycle, and thus usually results in large variances when estimating Q obs . ˆ Q obs scales up the stopping positions of the last probe vehicle with a relatively smaller scaling factor n i +1 n i , which results in smaller variances than ˆ Q obs . ˆ Q obs estimates Q obs by summing up the stopping positionsof the ﬁrst and the last probe vehicles in each cycle. Since there is no scaling up factor, the estimationaccuracy is even better. ˆ Q obs is a function of the penetration rate p . The queue length distribution requiredin the calculation is approximated by aggregating the stopping positions of all the probe vehicles. Theperformance of ˆ Q obs is similar to ˆ Q obs . As for the estimators of Q hid , ˆ Q hid generally has an edge over ˆ Q hid ,as it usually gives better results than ˆ Q hid . In addition, ˆ Q hid requires the signal timing information such asthe number of cycles which is not necessarily needed by ˆ Q hid .10 a) (b)(c) (d)(e) (f) Figure 4: The results of penetration rate estimation using diﬀerent methods11 a) (b)(c) (d)(e) (f)

Figure 5: The results of queue length estimation using diﬀerent methods

In order to demonstrate the impact of sample size on the estimation accuracy, the data of 100 cycles,200 cycles, 500 cycles, and 1,000 cycles are used in four rounds of tests, respectively. The submethod Q probe ˆ Q obs ( p )+ ˆ Q hid ( p ) = p is applied. The results in Figure 6 and Figure 7 show that better results can be obtainedwhen the sample size is larger. 12 a) (b)(c) (d) Figure 6: The results of penetration rate estimation with diﬀerent sample sizes13 a) (b)(c) (d)

Figure 7: The results of queue length estimation with diﬀerent sample sizes

To study the impact of the arrival rate on the estimation accuracy, the same submethod is applied to fourdiﬀerent Poisson processes of which the average arrival rates are 3, 5, 10, and 15, respectively. In each test,1,000 cycles of data are used. The results in Figure 8 and Figure 9 show that the larger the arrival rate is, themore accurate the estimation tends to be. The reason is that a higher arrival rate implies more observationsof the probe vehicles, which could generally improve the estimation accuracy.14 a) (b)(c) (d)

Figure 8: The results of penetration rate estimation with diﬀerent arrival rates15 a) (b)(c) (d)

Figure 9: The results of queue length estimation with diﬀerent arrival rates

In previous subsections, it is assumed that the queue in each cycle can be entirely discharged, that is, thereare no overﬂow queues. In the real world, the number of vehicles arriving at an intersection in a cycle mightexceed the number of vehicles the traﬃc signal can serve. To investigate the impact of the overﬂow queueson the estimation accuracy, the cases with overﬂow queues are also simulated. The simulation set-up of theoverﬂow queues is similar to Comert and Cetin (2009). The average arrival rates in the green phase and inthe red phase are set to 10. The maximum number of vehicles that can be served in each cycle is set to 22.The estimation results for penetration rates and queue lengths are shown in Figure 10. Since the simulationcaptures the eﬀect of overﬂow queues, the average queue length is diﬀerent from the average arrival rate inthe red phase. 16 a) (b)

Figure 10: The estimation results of the cases with overﬂow queues: (a) penetration rate estimation, (b)queue length estimation

The proposed methods are also tested using real-world data. The focus of this test is on traﬃc volumeestimation. Queue length estimation is not validated using real-world data because the ground truth ofqueue lengths is not available. The trajectory data are collected by Didi Chuxing from the vehicles oﬀeringits ride-hailing services in an area in Suzhou, Jiangsu Province, China, shown in Figure 11. The data ofthe 15 workdays from May 8, 2018, to May 28, 2018, are used for validation. The GPS trajectories of theDidi vehicles in the selected area are mapped onto the transportation network by a map matching algorithm(Newson and Krumm, 2009). For each movement and each one-hour time slot, the “snapshots” of thetrajectory data are taken to extract the observed partial queues. Due to the accuracy of the trajectory data,the average space headway for the queueing vehicles could not be easily estimated. Therefore, its value isempirically set to 7.5 m/veh for the peak hours and 8.0 m/veh for the oﬀ-peak hours. For the movements withmultiple lanes, since the accuracy of the trajectory data cannot reach the lane level, the stopping vehiclesare randomly assigned to the diﬀerent lanes. The random assignment process is repeated for 50 times toget an average estimate. Signal timing information from other data sources is not necessarily needed, as thetrajectory data of the probe vehicles already contain some signal timing information. For instance, if theobserved partial queue changes from (0 , , , , , ,

1) to (0 , , , , , , a)(b)(c) Figure 12: Accuracy of three typical cameras: (a) camera 1, (b) camera 2, (c) camera 3

Figure 13 shows the results of traﬃc volume estimation for the studied through movements in six diﬀerenttime slots. The estimation results show that the applied method Q probe ˆ Q obs + ˆ Q hid ( p ) = p can estimate traﬃc volumevery accurately, which would be suﬃcient for most applications of mid-term or long-term signal controland performance measures. Figure 14 shows the results for the left-turn movements. The underminedperformance further veriﬁes the eﬀect of the arrival rate on the estimation accuracy studied using thesimulation data, since the traﬃc volumes of the left-turn movements are much smaller compared to thethrough movements. 19ompared to the results of the simulation data, the estimation accuracy is undermined when the methodis applied to the real-world data, due to the following reasons. First, although the map matching algorithm(Newson and Krumm, 2009) can mitigate the eﬀect of GPS errors at the data preprocessing stage, theerrors in the real-world trajectory data could still inﬂuence the estimation accuracy. Second, in the realworld, for each movement and each one-hour time slot, the penetration rate and the queueing pattern mightslightly vary during the studied 15 workdays. Third, the average space headway for the queueing vehiclesis set empirically, which might introduce some biases into the results. If the data with better accuracy areavailable, the value of the average space headway should be estimated independently for each movement andeach time slot. 20 a) (b)(c) (d)(e) (f) Figure 13: Traﬃc volume estimation results for the through movements in diﬀerent TODs: (a) 08:00-09:00,(b) 10:00-11:00, (c) 12:00-13:00, (d) 14:00-15:00, (e) 16:00-17:00, (f) 18:00-19:0021 a) (b)(c) (d)(e) (f)

Figure 14: Traﬃc volume estimation results for the left-turn movements in diﬀerent TODs: (a) 08:00-09:00,(b) 10:00-11:00, (c) 12:00-13:00, (d) 14:00-15:00, (e) 16:00-17:00, (f) 18:00-19:0022

Conclusions

This paper proposes a general framework and a series of methods for the trajectory-based queue length andtraﬃc volume estimation. For each speciﬁc movement and each speciﬁc time slot, the penetration rate of theprobe vehicles is estimated by using the aggregated historical trajectory data of the probe vehicles. Oncethe penetration rate is estimated, it can be used to project the queue length and the traﬃc volume.The proposed methods do not assume the type of vehicle arrival process or the queueing process. There-fore, the proposed methods are adaptable to both under-saturation and over-saturation cases. The proposedmethods do not require high penetration rates and would be feasible for use in reality nowadays. The testsby both the simulation and the real-world data show good estimation accuracy, indicating that the proposedmethods could be used for traﬃc signal control and performance measures at signalized intersections.There are certain limitations in the current work that should be addressed in the future. For instance,the proposed methods in this paper take the stopping positions of the probe vehicles as the features to inferthe penetration rate of the probe vehicles. However, there might not be queues forming at the non-signalizedintersections or in the right-turn movements. Also, the queueing patterns in the shared left-through (right-through) lanes could be diﬀerent from other left-turn (right-turn) lanes or through lanes. Therefore, whenapplying the proposed methods to these cases, additional care is required.23 eferences

An, C., Wu, Y.-J., Xia, J., Huang, W., 2018. Real-time queue length estimation using event-based advancedetector data. Journal of Intelligent Transportation Systems 22 (4), 277–290.Badillo, B. E., Rakha, H., Rioux, T. W., Abrams, M., 2012. Queue length estimation using conventionalvehicle detector and probe vehicle data. In: Intelligent Transportation Systems (ITSC), 2012 15th Inter-national IEEE Conference on. IEEE, pp. 1674–1681.Ban, X. J., Hao, P., Sun, Z., 2011. Real time queue length estimation for signalized intersections using traveltimes from mobile sensors. Transportation Research Part C: Emerging Technologies 19 (6), 1133–1156.Cai, Q., Wang, Z., Zheng, L., Wu, B., Wang, Y., 2014. Shock wave approach for estimating queue lengthat signalized intersections by fusing data from point and mobile sensors. Transportation Research Record:Journal of the Transportation Research Board 2422, 79–87.Cetin, M., 2012. Estimating queue dynamics at signalized intersections from probe vehicle data: Methodologybased on kinematic wave model. Transportation Research Record: Journal of the Transportation ResearchBoard 2315 (1), 164–172.Comert, G., 2013a. Eﬀect of stop line detection in queue length estimation at traﬃc signals from probevehicles data. European Journal of Operational Research 226 (1), 67–76.Comert, G., 2013b. Simple analytical models for estimating the queue lengths from probe vehicles at traﬃcsignals. Transportation Research Part B: Methodological 55, 59–74.Comert, G., 2016. Queue length estimation from probe vehicles at isolated intersections: Estimators forprimary parameters. European Journal of Operational Research 252 (2), 502–521.Comert, G., Cetin, M., 2009. Queue length estimation from probe vehicle location and the impacts of samplesize. European Journal of Operational Research 197 (1), 196–202.Comert, G., Cetin, M., 2011. Analytical evaluation of the error in queue length estimation at traﬃc signalsfrom probe vehicle data. IEEE Transactions on Intelligent Transportation Systems 12 (2), 563–573.Guo, Q., Li, L., Ban, X. J., 2019. Urban traﬃc signal control with connected and automated vehicles: Asurvey. Transportation Research Part C: Emerging Technologies 101, 313–334.Hao, P., Ban, X. J., 2015. Long queue estimation for signalized intersections using mobile data. Transporta-tion Research Part B: Methodological 82, 54–73.Hao, P., Ban, X. J., Whon Yu, J., 2015. Kinematic equation-based vehicle queue location estimation methodfor signalized intersections using mobile sensor data. Journal of Intelligent Transportation Systems 19 (3),256–272.Lee, S., Wong, S. C., Li, Y. C., 2015. Real-time estimation of lane-based queue lengths at isolated signalizedjunctions. Transportation Research Part C: Emerging Technologies 56, 1–17.Li, F., Tang, K., Yao, J., Li, K., 2017. Real-time queue length estimation for signalized intersections usingvehicle trajectory data. Transportation Research Record: Journal of the Transportation Research Board2623, 49–59.Li, J., Zhou, K., Shladover, S. E., Skabardonis, A., 2013. Estimating queue length under connected vehicletechnology: Using probe vehicle, loop detector, fused data. Transportation Research Record: Journal ofthe Transportation Research Board 2366 (1), 17–22.Liu, H. X., Wu, X., Ma, W., Hu, H., 2009. Real-time queue length estimation for congested signalizedintersections. Transportation Research Part C: Emerging Technologies 17 (4), 412–427.Merris, R., 2003. Combinatorics. Vol. 67. John Wiley & Sons.24ewson, P., Krumm, J., 2009. Hidden markov map matching through noise and sparseness. In: Proceedingsof the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.ACM, pp. 336–343.Ramezani, M., Geroliminis, N., 2015. Queue proﬁle estimation in congested urban networks with probe data.Computer-Aided Civil and Infrastructure Engineering 30 (6), 414–432.Rompis, S. Y., Cetin, M., Habtemichael, F., 2018. Probe vehicle lane identiﬁcation for queue length estima-tion at intersections. Journal of Intelligent Transportation Systems 22 (1), 10–25.Shahrbabaki, M. R., Safavi, A. A., Papageorgiou, M., Papamichail, I., 2018. A data fusion approach forreal-time traﬃc state estimation in urban signalized links. Transportation Research Part C: EmergingTechnologies 92, 525–548.Wang, S., Huang, W., Lo, H. K., 2019. Traﬃc parameters estimation for signalized intersections based oncombined shockwave analysis and bayesian network. Transportation Research Part C: Emerging Technolo-gies 104, 22–37.Wang, Z., Cai, Q., Wu, B., Zheng, L., Wang, Y., 2017. Shockwave-based queue estimation approach forundersaturated and oversaturated signalized intersections using multi-source detection data. Journal ofIntelligent Transportation Systems 21 (3), 167–178.Wong, W., Shen, S., Zhao, Y., Liu, H. X., 2019a. On the estimation of connected vehicle penetration ratebased on single-source connected vehicle data. Transportation Research Part B: Methodological 126, 169–191.Wong, W., Wong, S., 2015. Systematic bias in transport model calibration arising from the variability oflinear data projection. Transportation Research Part B: Methodological 75, 1–18.Wong, W., Wong, S., 2019. Unbiased estimation methods of nonlinear transport models based on linearlyprojected data. Transportation Science 53 (3), 665–682.Wong, W., Wong, S. C., 2016a. Biased standard error estimations in transport model calibration due toheteroscedasticity arising from the variability of linear data projection. Transportation Research Part B:Methodological 88, 72–92.Wong, W., Wong, S. C., 2016b. Evaluation of the impact of traﬃc incidents using gps data. Proceedings ofthe Institution of Civil Engineers-Transport 169 (3), 148–162.Wong, W., Wong, S. C., Liu, H. X., 2019b. Bootstrap standard error estimations of nonlinear transportmodels based on linearly projected data. Transportmetrica A: Transport Science 15 (2), 602–630.Zhan, X., Zheng, Y., Yi, X., Ukkusuri, S. V., 2017. Citywide traﬃc volume estimation using trajectory data.IEEE Transactions on Knowledge & Data Engineering 2, 272–285.Zhao, Y., Zheng, J., Wong, W., Wang, X., Meng, Y., Liu, H. X., 2019. Estimation of queue lengths, probevehicle penetration rates, and traﬃc volumes at signalized intersections using probe vehicle trajectories.Transportation Research Record: Journal of the Transportation Research Board.Zheng, J., Liu, H. X., 2017. Estimating traﬃc volumes for signalized intersections using connected vehicledata. Transportation Research Part C: Emerging Technologies 79, 347–362.25 ppendix A

Deﬁnitions

For k, n ∈ N and n ≥ k , C kn = n ! k !( n − k )! , (A.1) A kn = n !( n − k )! . (A.2) Theorem 1

For conciseness, Q i , N i , S i , T i , n i , s i , t i are represented by Q, N, S, T, n, s, t , respectively. E ( S | N = n, Q = l ) = l + 1 n + 1 , (A.3) E ( Q | N = n ) = E ( S | N = n )( n + 1) − , (A.4)where n ≥ E ( S | N = n, Q = l ) = l − n +1 (cid:88) j =1 P ( S = j | N = n, Q = l ) j (A.5)= l − n +1 (cid:88) j =1 nC j − l − n A j − j − A l − jl − j A ll j (A.6)= l − n +1 (cid:88) j =1 nA n − l − j A nl i (A.7)= nA nl l − n +1 (cid:88) j =1 A n − l − j j (A.8)= nA nl l − n (cid:88) k =0 A n − n + k − ( l − n + 1 − k ) (A.9)= nA nl l − n (cid:88) k =0 A n − n + k − ( l + 1) − nA nl l − n (cid:88) k =0 A n − n + k − ( n + k ) (A.10)= ( l + 1) l − n (cid:88) k =0 ( n + k − l − n )! n ! k ! l !( n − − nA nl l − n (cid:88) k =0 A nn + k (A.11)= l + 1 C nl l − n (cid:88) k =0 C n − n + k − − nC nl l − n (cid:88) k =0 C nn + k (A.12)= ( l + 1) C nl C nl − n C n +1 l +1 C nl (A.13)= ( l + 1) − n l + 1 n + 1 (A.14)= l + 1 n + 1 (A.15)Chu’s theorem (Merris, 2003) is applied when converting equation (A.12) to equation (A.13).26hen, based on the results above, E ( S | N = n ) = L max (cid:88) j =1 P ( S = j | N = n ) j (A.16)= L max (cid:88) j =1 L max (cid:88) l = j + n − P ( S = j | N = n, Q = l ) P ( Q = l | N = n ) j (A.17)= L max (cid:88) l = n l − n +1 (cid:88) j =1 P ( S = j | N = n, Q = l ) P ( Q = l | N = n ) j (A.18)= L max (cid:88) l = n P ( Q = l | N = n ) l − n +1 (cid:88) j =1 P ( S = j | N = n, Q = l ) j (A.19)= L max (cid:88) l = n P ( Q = l | N = n ) E ( S | N = n, Q = l ) (A.20)= L max (cid:88) l = n P ( Q = l | N = n ) l + 1 n + 1 (A.21)= 1 n + 1 L max (cid:88) l = n P ( Q = l | N = n )( l + 1) (A.22)= 1 n + 1 ( E ( Q | N = n ) + 1) . (A.23)This is equivalent to E ( Q | N = n ) = E ( S | N = n )( n + 1) − . (A.24) Theorem 2

For conciseness, Q i , N i , S i , T i , n i , s i , t i are represented by Q, N, S, T, n, s, t , respectively. E ( T | N = n, Q = l ) = n l + 1 n + 1 , (A.25) E ( Q | N = n ) = E ( T | N = n ) n + 1 n − , (A.26)where n ≥ ( T | N = n, Q = l ) = l (cid:88) j = n P ( T = j | N = n, Q = l ) j (A.27)= l (cid:88) j = n nC l − jl − n A j − j − A l − jl − j A ll j (A.28)= l (cid:88) j = n nA n − j − A nl j (A.29)= n l (cid:88) j = n A nj A nl (A.30)= n l (cid:88) j = n C nj C nl (A.31)= nC nl l − n (cid:88) k =0 C nn + k (A.32)= nC n +1 l +1 C nl (A.33)= n l + 1 n + 1 (A.34)Then, based on the results above, E ( T | N = n ) = L max (cid:88) j = n P ( T = j | N = n ) j (A.35)= L max (cid:88) j = n L max (cid:88) l = j P ( T = j | N = n, Q = l ) P ( Q = l | N = n ) j (A.36)= L max (cid:88) l = n l (cid:88) j = n P ( T = j | N = n, Q = l ) P ( Q = l | N = n ) j (A.37)= L max (cid:88) l = n P ( Q = l | N = n ) l (cid:88) j = n P ( T = j | N = n, Q = l ) j (A.38)= L max (cid:88) l = n P ( Q = l | N = n ) E ( T | N = n, Q = l ) (A.39)= L max (cid:88) l = n P ( Q = l | N = n ) n l + 1 n + 1 (A.40)= nn + 1 L max (cid:88) l = n P ( Q = l | N = n )( l + 1) (A.41)= nn + 1 ( E ( Q | N = n ) + 1) . (A.42)This is equivalent to E ( Q | N = n ) = E ( T | N = n ) n + 1 n − . (A.43)28 heorem 3 For conciseness, Q i , N i , S i , T i , n i , s i , t i are represented by Q, N, S, T, n, s, t , respectively. E ( Q | N ≥

1) = E ( S | N ≥

1) + E ( T | N ≥ − L max (cid:88) l =1 l (cid:88) k =1 P ( T = k | N ≥ , Q = l ) P ( Q = l | N ≥ l −

1) + 1 (A.57)= L max (cid:88) l =1 P ( Q = l | N ≥ l −

1) + 1 (A.58)= L max (cid:88) l =1 P ( Q = l | N ≥ l (A.59)= E ( Q | N ≥≥