[PDF] Short-Term Traffic Flow Prediction Using Variational LSTM Networks

Abstract

Traffic flow characteristics are one of the most critical decision-making and traffic policing factors in a region. Awareness of the predicted status of the traffic flow has prime importance in traffic management and traffic information divisions. The purpose of this research is to suggest a forecasting model for traffic flow by using deep learning techniques based on historical data in the Intelligent Transportation Systems area. The historical data collected from the Caltrans Performance Measurement Systems (PeMS) for six months in 2019. The proposed prediction model is a Variational Long Short-Term Memory Encoder in brief VLSTM-E try to estimate the flow accurately in contrast to other conventional methods. VLSTM-E can provide more reliable short-term traffic flow by considering the distribution and missing values.

Full PDF

SS HORT -T ERM T RAFFIC F LOW P REDICTION U SING V ARIATIONAL

LSTM N

ETWORKS

A P

REPRINT

Mehrdad Farahani

Department of Computer EngineeringIslamic Azad University North Tehran BranchTehran, Iran [email protected]

Marzieh Farahani

Department of Computing ScienceUmeå UniversityUmeå, Sweden [email protected]

Mohammad Manthouri

Department of Electrical and Electronic EngineeringShahed UniverisityTehran, Iran [email protected]

Okyay Kaynak

Department of Electrical and Electronic EngineeringBogazici UniversityIstanbul, Turkey [email protected] A BSTRACT

Trafﬁc ﬂow characteristics are one of the most critical decision-making and trafﬁc policing factorsin a region. Awareness of the predicted status of the trafﬁc ﬂow has prime importance in trafﬁcmanagement and trafﬁc information divisions. The purpose of this research is to suggest a forecastingmodel for trafﬁc ﬂow by using deep learning techniques based on historical data in the IntelligentTransportation Systems area. The historical data collected from the Caltrans Performance Mea-surement Systems (PeMS) for six months in 2019. The proposed prediction model is a VariationalLong Short-Term Memory Encoder in brief VLSTM-E try to estimate the ﬂow accurately in contrastto other conventional methods. VLSTM-E can provide more reliable short-term trafﬁc ﬂow byconsidering the distribution and missing values. K eywords Trafﬁc Flow Prediction · Short-term Prediction · Variational Encoder · Long Short-Term Memory

Urban life has undergone many changes in the development of local communities. This transport transformation andtrafﬁc congestion lead to road-clogging, slower speeds, longer trip times, and increased vehicular queuing in most of theurban and suburban passages in the world. This issue will be the trigger of abundant problems such as air pollution andnoise pollution and in total, has a massive role in quality reductions. Therefore, governors recognize intelligent trafﬁcﬂow control systems as a priority plan for their countries. The trafﬁc ﬂow forecasting is a crucial step for obtaining timeoptimizers in the public trafﬁc adaptive control system.Trafﬁc ﬂow prediction is a signiﬁcant issue for both transport management from one side and drivers and ordinarypeople on the other side. These methods help managers to recognize heavy trafﬁcs in the countrysides. Using somepredeﬁned paradigms and protocols can avoid the incidence of long trafﬁc jams. On the other hand, drivers and ordinarypeople can also make a better decision based on that prediction and contributing to decreasing trafﬁc levels. Therefore,predicting trafﬁc ﬂow characteristics in a geographical area is one of the most critical decision-making and policymakersthat have a signiﬁcant effect on urban trafﬁc management. Mainly trafﬁc ﬂow prediction divided into three categories[1]. • Short-term forecasting (the interval is 5 minutes to 30 minutes) • Medium-term forecasting (a time interval of 30 minutes to several hours) a r X i v : . [ c s . L G ] F e b hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks A P

REPRINT • Long-term forecasting (ranges of one day to several days)The ultimate goal in this domain is to evaluate the trafﬁc ﬂow prediction with the historical trafﬁc data in a particularregion before it happens. However, unpredictable disturbances, including internal-events in transportation ways (such asan accident, falling part of the route) and unexpected external-events (such as a ﬂood, storm) make long-term forecastinginaccurate enough. While medium-term or short-term forecasting can be reliable if they correctly setup.In this research, the short-term case takes into consideration. The hybrid deep learning method predicts the ﬂow basedon a complex generative model from the data, which can recognize the spatial and temporal correlation within thesequence of trafﬁc ﬂows in a particular range. Furthermore, in the following, the recommended model compares toother state-of-the-art models.The contribution of this paper can be summarized as follows: • Presenting a novel hybrid deep learning model based on a Variational Long Short-Term Memory Encoder(VLSTM-E) • The proposed model is considering the distribution of data to forecast short-term trafﬁc ﬂow • Take into consideration the missing data, which occurred by sensors failure by the distributed dataThe paper is segmented as follows; the next section gives a brief description of terminologies, challenges, and othermethods of short-term trafﬁc forecasting research concerning several neural network techniques. In section 3, thebackground of the model is introduced. Then, In section 4, the suggested model is presented. The dataset is denotedin section 5, and the results, and performance evaluation are presented in section 6. Finally, conclusions and futureresearch are stated in section 7.

Trafﬁc ﬂow forecasting is one of the most useful tools in intelligent transportation systems (ITS). It allows the system tobe in a control automatic operation state and anticipates the events before they occur. It can be able to predict and assessthe states and prepare itself for logical decision-making at the machine level, and based on human-made protocols canmanage the condition [2]. Meanwhile, the short-term prediction of the trafﬁc ﬂow is more critical than the other twobefore categories in the ﬁeld of intelligent transportation systems, in which many research and development are done inboth academically and operationally [2]. A great deal of research on the short-term forecasting model can be classiﬁedinto two main categories: • Parametric, Including methods such as state-space methods [3], Kalman ﬁlter methods [4], spectra analysismethods [5], statistical techniques [6], ARIMA, ARIMAX, and SARIMA models [7, 8, 9], and Markov model[10, 11]. • Nonparametric, In these models, with non-linear backgrounds, we are trying to ﬁnd the model that hasthe most receptive learning features. Many research has gotten lots of remarkable results with this insight,such as non-parametric regression techniques [12, 13, 14], k-nearest neighbor models [15], fuzzy techniques[16, 17, 18], neural networks [19, 20, 21, 22, 23], and support vector machine [24, 25, 26].The spatial-temporal real-time information by trafﬁc sensors around the country is one of the signs of technologicaladvancement that brings up valuable facilities for the transportation systems of the country. The information provides amassive amount of patterns and paradigms of terrestrial transport in a geographic location. Moreover, the direct andindirect effects of that information present the foundation for the application of deep learning networks. Deep learningis a section of machine learning that grants short-term forecasts of trafﬁc ﬂows to ﬁnd latent dependence relationshipsin a set of patterns with high dimensions of explanatory variables. This model tries to detect extreme disturbances in thetrafﬁc ﬂow within a pool of latent relations providing by real-time sensors [27, 28]. Nevertheless, there is no clue thatwhich types of deep learning models are the most appropriate model for forecasting trafﬁc ﬂows. All of these modelsare trying to ﬁnd a part of these latent relations by presenting a different structure.For example, the Stacked Autoencoders model was introduced by considering time and space correlation, was able tolearn the general characteristics of the trafﬁc ﬂow [29]. Another model that was able to achieve better performanceis the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks [30]. These models provided asolution for gaining better results with an increase in the length of the sequences of information. It is necessary to takeinto account the effects of time before, and after more on each day. The performance of these models is signiﬁcantlydowned due to the accumulation of errors. The LSTM+ model in [31] made it possible to achieve better performanceconsidering these effects. 2hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT

In addition to predicting trafﬁc ﬂow behavior, which is one the importance of the trafﬁc ﬂow prediction, trafﬁc sensorsare usually controlling manually, so these collections of data from sensors accompany with various lengths, irregularsampling, and missing data. These dissonances make this prediction complicated. To solve this challenge, the researcherproposed a model base on Long Short-Term Memory in [32]. Also, Convolutional Neural Network models, whichshowed their abilities to resolve image issues, are used in this domain so that they could provide excellent results inprediction the trafﬁc ﬂow [33].

Since the central core of the proposed mode divided into two parts, variational and Long Short-Term Memory (LSTM).In the following, each section introduced in detail.

Long short-term memory (LSTM), as shown in Fig (1), proposed by [34], is a recursive neural network architecturethat is capable of learning long-term dependencies. This model has been developed to deal with vanishing gradientproblems and considered a deep neural network architecture over time. The main component of the Long short-termmemory layer is the memory cell. Figure 1: Long short-term memory cell.A memory cell consists of four main elements: an input gate, a neuron with reconnection, a forget gate, and anoutput gate. The following equations show step by step operation of a layer of memory cells for input time series as X = ( x , x , x , ..., x n ) , hidden states memory cells H = ( h , h , h , ..., h n ) . i t = σ (cid:0) x t U i + h t − W i (cid:1) (1) f t = σ (cid:0) x t U f + h t − W f (cid:1) (2) o t = σ (cid:0) x t U o + h t − W o (cid:1) (3) ˜ C t = tanh (cid:0) x t U g + h t − W g (cid:1) (4) C t = σ (cid:0) f t ∗ C t − + i t ∗ ˜ C t (cid:1) (5) h t = tanh( C t ) ∗ o t (6)The ∗ sign in this calculation considered as element-wise multiplication, and by refusing the bias terms, it can be shownhow the hidden layer calculated at a time h t . In the calculations above: • i, f, o are called the input, forget and output gates, respectively. • W i , W f , W o the weights connect the recurrence layer at t − to the hidden layer at time t . • U i , U f , U o weights that connect the hidden layer at time t − to the recursive layer at time t .At the end of the weighted non-linear calculation in the gates section, the output enters int a sigmoid activation functionso that it can simulate the gating concept since the sigmoid activation function as shown in Eq (7) with a range from 0to 1 can provide a gateway as an open or closed concept 3hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks A P

REPRINT σ x = 11 + e x (7)In Long Short-Term Memory networks, the objective function can be different depending on the structure of theproblem, which cross-entropy, softmax, and l quadratic can be called accessible functions. Before paying attention to the variational part, it is necessary to get acquainted with the concept of an Autoencoder[35]. The Autoencoder network is a bipartite neural network that teaches the network to compress the information byforcing an encoder network to the output in that case to a low dimensional representation z , which is then consumed bya decoder network to output the original data as shown in (2).Figure 2: Autoencoder model architecture.However, concerning the variational part [36], we must say that the goal is to achieve a model in which reproduction isnot dependent only on data. Variational Autoencoder tries to decode data from some known probability distribution,in this case, Gaussian distribution that comes from encoding part to produce reasonable outputs even if they are notencoding actual data as shown in Fig (3).Suppose x = x (1) , x (2) , x (3) , ..., x ( N ) be a set of observed variables and z = z (1) , z (2) , z (3) , ..., z ( M ) be a set of hiddenvariables with joint distribution p ( Z, X ) . Label this distribution as p θ which parameterized by θ . To generate a samplethat looks like a real data point x ( i ) as shown in Fig (4).Then the inference issue is to calculate the conditional distribution of hidden variables given the observations, that is, p θ ( z | x ) which can write as shown in Eq (8). p θ ( z | x ) = p θ ( z , x ) p θ ( x ) (8) p θ ( x ) = (cid:90) p θ ( x | z ) p θ ( z ) d z Unfortunately, computing p θ ( x ) is quite difﬁcult because it is very expensive to check all the possible values of z andsum them up. So, to solve this issue, approximate p θ ( z | x ) by another distibution q φ ( z | x ) then can perform approximateinference of the intractable distribution. In order to ensure that q φ ( z | x ) and p θ ( z | x ) were similar to each other, wecould minimize the KL divergence between these two distributions, as shown in Eq (9).4hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks A P

REPRINT

Figure 3: Variational Autoencoder model with the multivariate Gaussian assumptionFigure 4: The graphical model of Variational Autoencoder. Solid lines denote the generative distribution p θ ( z ) , anddashed lines denote the distribution q φ ( z | x ) to approximate the intractable posterior p θ ( z | x ) . D KL ( q φ ( z | x ) (cid:107) p θ ( z | x )) (9) = (cid:90) q φ ( z | x ) log q φ ( z | x ) p θ ( z | x ) d z = (cid:90) q φ ( z | x ) log q φ ( z | x ) p θ ( x ) p θ ( z , x ) d z = log p θ ( x ) + D KL ( q φ ( z | x ) (cid:107) p θ ( z )) − E z ∼ q φ ( z | x ) log p θ ( x | z ) Then rearrange the left and right-hand side of the equation. We have Eq (10); moreover, then the loss function would beas the variational lower bound, or evidence lower bound, as shown in Eq (11). log p θ ( x ) − D KL ( q φ ( z | x ) (cid:107) p θ ( z | x )) (10) = E z ∼ q φ ( z | x ) log p θ ( x | z ) − D KL ( q φ ( z | x ) (cid:107) p θ ( z )) A P

Therefore by minimizing the loss, we are maximizing the lower bound of the probability of generating real data samplesin Eq (12). − L VAE = log p θ ( x ) − D KL ( q φ ( z | x ) (cid:107) p θ ( z | x )) ≤ log p θ ( x ) (12) According to the previous approaches, the proposed model includes a Variational Autoencoder, which uses LSTM as itsencoder and decoder parts, as shown in Fig (5). Long Short-Term Memory acts as an exploiter both the past and futureinformation — ﬁnally, a multi-layer perceptron (MLP) network, which is responsible for mapping the target with thesamples of distribution, which learned by the VLSTM-E.Figure 5: Illustration of the proposed model architecture.In this proposed approach, the network simultaneously learns the distribution of z and transmits samplings from thedistribution and feed into the Multilayer Perceptron model to estimate trafﬁc ﬂow6hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks A P

REPRINT

Figure 6: The trafﬁc ﬂow between two station in the San Bernardino Fwy.Caltrans Performance Measurement System (PeMS) used as a public dataset. It was collected in the real-time form ofdata by more than 39,000 individual detectors across all major metropolitan areas of the state of California. PerformanceMeasurement System provides a signiﬁcant variety source of trafﬁc data integrated from Caltrans and other local agencysystems.In this paper, the trafﬁc ﬂow dataset consists of sensors information in the California area, district seven, between2019-01-01 to 2019-05-30 in a ﬁve minutes interval detections. In the case of sensors failure, some records have novalues (missing data). In this scenario, a combination of Spline-Interpolation and average over a 15 minutes interval,could help the model learn inner patterns desirably. Then the dataset prepared in preprocessing steps. In this particularcase, the proposed model would be tested on the trafﬁc ﬂows of two points between station 716076 and 717060, asshown in Fig (6).Then for each record at time t , data related to time t is selected as additional features. In other words, our data ispicked up to 12 earlier records as a look back. Then the data is scaled into a Min-Max scaler. The data in 2019 between2019-01-01 00:00:00 to 2019-03-31 23:59:00 chose as a training set others for testing, as shown in Table (1). Besides,typical daily trafﬁc ﬂow charts are presented in Fig (7) for both training and testing parts regarding two stations. Table 1: Displays the dimensional division of data into training and testing

Stations X Train Y Train X Test Y Test716076 8628 x 12 x 1 5778 x 12 x 1 8628 x 1 5778 x 1717060 8628 x 12 x 1 6187 x 12 x 1 8628 x 1 6187 x 1

In terms of hardware, the GPU we use is Tesla k80 which provided by Google Colab[37]. The proposed VLSTM-Earchitecture and chosen networks were implemented on the TensorFlow platform (v1.14.0) [38]. The learning rate is0.0001, and the batch size is 256, the sigmoid is used for both as the activation of the last layer.7hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT (a)(b)

Figure 7: Typical daily trafﬁc ﬂow pattern for two stations 716076 and 717060. (a) Trafﬁc ﬂow from Tuesday 1 January2019 to Saturday 5 January 2019 as a training example. (b) Trafﬁc ﬂow from Saturday 20 April 2019 to Wednesday 24April 2019 as a testing example.8hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT

Four measurements introduced in this paper to evaluate the effectiveness of the proposed model, in the follows: e i = f i − (cid:98) f i (13) M SE = 1 n n (cid:88) t =1 e i (14) RM SE = (cid:118)(cid:117)(cid:117)(cid:116) n n (cid:88) t =1 e i (15) M AE = 1 n n (cid:88) t =1 | e i | (16) M AP E = 100% n n (cid:88) t =1 (cid:12)(cid:12)(cid:12)(cid:12) e i f i (cid:12)(cid:12)(cid:12)(cid:12) (17)where n is the number of the test sample, f i is the real trafﬁc ﬂow in sample i , and (cid:98) f i denotes the predicted trafﬁc ﬂow. In the following, the results presented as evaluation results and forecasting the trafﬁc ﬂow for VLSTM-E (Table (2), Fig(8)), LSTM (Table (3), Fig (9)), MCNNM (Table (4), Fig (10)), and SAEs (Table (5), Fig (11)), respectively.Table 2: The evaluation results for the Variational Long Short-Term Memory Encoder (VLSTM-E) model.

VLSTM-EStation ID MAPE [%] MAE MSE RMSE716076 9.5954 0.0312 0.0018 0.0422717060 8.8625 0.0276 0.0015 0.0381

Table 3: The evaluation results for the Long Short-Term Memory (LSTM) model.

LSTMStation ID MAPE [%] MAE MSE RMSE716076 10.2718 0.0341 0.0024 0.0490717060 10.8174 0.0366 0.0022 0.0464

A P

REPRINT (a)(b)

Figure 8: Typical daily trafﬁc ﬂow forecasting for two stations 716076 and 717060 by VLSTM-E model betweenSaturday 20 April 2019 to Wednesday 24 April 2019. (a) Trafﬁc ﬂow forecasting for 716076. (b) Trafﬁc ﬂowforecasting for 717060.10hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT (a)(b)

Figure 9: Typical daily trafﬁc ﬂow forecasting for two stations 716076 and 717060 by LSTM model between Saturday20 April 2019 to Wednesday 24 April 2019. (a) Trafﬁc ﬂow forecasting for 716076. (b) Trafﬁc ﬂow forecasting for717060.11hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT (a)(b)

Figure 10: Typical daily trafﬁc ﬂow forecasting for two stations 716076 and 717060 by MCNNM model betweenSaturday 20 April 2019 to Wednesday 24 April 2019. (a) Trafﬁc ﬂow forecasting for 716076. (b) Trafﬁc ﬂowforecasting for 717060.12hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT

Table 4: The evaluation results for the Multiple Convolutional Neural Network for Multivariate (MCNNM) model.

MCNNMStation ID MAPE [%] MAE MSE RMSE716076 31.0840 0.0757 0.0129 0.1136717060 24.0724 0.0603 0.0082 0.0905

Table 5: The evaluation results for the Stacked Autoencoders (SAEs) model.

SAEsStation ID MAPE [%] MAE MSE RMSE716076 9.9421 0.0326 0.0020 0.0449717060 18.4939 0.0560 0.0040 0.0635

As the results show, the proposed model, VLSTM-E, has improved compared to other conventional models like theStacked Autoencoders, Long Short-Term Memory, and Multiple Convolutional Neural Network, which introduced in2015 [29], 2016 [30] and 2019 [33]. To better understanding, this superiority, the average of the results according to theevaluation criterion is presented in Table (6) which, shows the MSE score of the VLSTM-E is 0.0016.Table 6: Average performance for all the models.

Average ModelsStation ID MAPE [%] MAE MSE RMSEVLSTM-E 9.2290 0.0294 0.0016 0.0402LSTM [30] 10.5446 0.0353 0.0023 0.0477MCNNM [33] 27.5782 0.0680 0.0106 0.1021SAEs [29] 14.2180 0.0443 0.0030 0.0542

Figures (12, 13) shows the prediction results for the two stations 716076, and 717060 for the test dataset on 2019, April20. As can be seen, in all stations, the VLSTM-E curve has a better estimation of the trafﬁc ﬂow than other curves. Incases where the trafﬁc ﬂow ﬂuctuates in viewing a large amount of trafﬁc, the model can quickly converge into thatbehavior. Also, in low volume volatility, imitation shows a better response than the Long Short-Term Memory model.Perhaps the reason for this improvement can be found in the data structure; in some cases, the sensors in the stationscan not detect the observation, or even this observation will not be highly accurate. In another word, these sensorsmight be failed in vehicle detection, so it caused missing values. Since the model related to the distribution of data, andthe sample of this distribution feed into the network, it can be reduced the adverse effects of these missing data in thelearning process and lead to satisfactory results than the other models like Long Short-Term Memory.

This paper presents a Deep Learning approach with a Variational Long Short-Term Memory Encoder to predict theshort-term trafﬁc ﬂow. In contrast to the previous approaches [30], this model considers the pattern of the data andprovided a solution for missing data. So, it could achieve better results based on the four evaluation criteria in contrastto the other models [29, 30, 33], which were introduced earlier. This model is implemented on the PeMS dataset. Asuggestion for future work would be interesting if implemented on the other dataset that the stations and its sensorsproduce missing or low-value information. Also, on various distributions, such as Dirichlet distribution, can be usefulin improving sample distribution in trafﬁc ﬂow. 13hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT (a)(b)

Figure 11: Typical daily trafﬁc ﬂow forecasting for two stations 716076 and 717060 by SAEs model between Saturday20 April 2019 to Wednesday 24 April 2019. (a) Trafﬁc ﬂow forecasting for 716076. (b) Trafﬁc ﬂow forecasting for717060.14hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT

Figure 12: Forecasting performance on Varitional Long Short-Term Memory Encoder (VLSTME), Long Short-TermMemory (LSTM), Multiple Convolutional Neural Network for Multivariate (MCNNM), and Stacked Autoencoders(SAEs) for 716076 station!15hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT

Figure 13: Forecasting performance on Varitional Long Short-Term Memory Encoder (VLSTME), Long Short-TermMemory (LSTM), Multiple Convolutional Neural Network for Multivariate (MCNNM), and Stacked Autoencoders(SAEs) for 717060 station!16hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT

References [1] Zhongsheng Hou and Xingyi Li. Repeatability and similarity of freeway trafﬁc ﬂow and long-term prediction underbig data.

IEEE Transactions on Intelligent Transportation Systems , 17:1786–1796, 2016.[2] Se do Oh, Young jin Kim, and Ji sun Hong. Urban trafﬁc ﬂow prediction system using a multifactor patternrecognition model.

IEEE Transactions on Intelligent Transportation Systems , 16:2744–2755, 2015.[3] Anthony Stathopoulos and Matthew G. Karlaftis. A multivariate state space approach for urban trafﬁc ﬂow modelingand prediction.

Transportation Research Part C: Emerging Technologies , 11(2):121–135, April 2003.[4] Teng Zhou, Dazhi Jiang, Zhizhe Lin, Guoqiang Han, Xuemiao Xu, and Jing Qin. Hybrid dual kalman ﬁlteringmodel for short-term trafﬁc ﬂow forecasting.

IET Intelligent Transport Systems , 13(6):1023–1032, June 2019.[5] Yanru Zhang, Yunlong Zhang, and Ali Haghani. A hybrid short-term trafﬁc ﬂow forecasting method based onspectral analysis and statistical volatility model.

Transportation Research Part C: Emerging Technologies , 43:65–78,June 2014.[6] Milan Krbálek, Jiˇrí Apeltauer, and František Šeba. Trafﬁc ﬂow merging – statistical and numerical modeling ofmicrostructure.

Journal of Computational Science , 32:99–105, March 2019.[7] Xianglong Luo, Liyao Niu, and Shengrui Zhang. An algorithm for trafﬁc ﬂow prediction based on improved sarimaand ga.

KSCE Journal of Civil Engineering , 22(10):4107–4115, Oct 2018.[8] Qinzhong Hou, Junqiang Leng, Guosheng Ma, Weiyi Liu, and Yuxing Cheng. An adaptive hybrid model forshort-term urban trafﬁc ﬂow prediction.

Physica A: Statistical Mechanics and its Applications , 527:121065, August2019.[9] Chukwutoo C. Ihueze and Uchendu O. Onwurah. Road trafﬁc accidents prediction modelling: An analysis ofanambra state, nigeria.

Accident Analysis & Prevention , 112:21–29, March 2018.[10] Guangyu Zhu, Kang Song, Peng Zhang, and Li Wang. A trafﬁc ﬂow state transition model for urban road networkbased on hidden markov model.

Neurocomputing , 214:567–574, November 2016.[11] Liguo Zhang and Christophe Prieur. Stochastic stability of markov jump hyperbolic systems with application totrafﬁc ﬂow control.

Automatica , 86:29–37, December 2017.[12] Darong Huang and Xing rong Bai. A wavelet neural network optimal control model for trafﬁc-ﬂow predictionin intelligent transport systems. In

Advanced Intelligent Computing Theories and Applications. With Aspects ofArtiﬁcial Intelligence , pages 1233–1244. Springer Berlin Heidelberg, 2007.[13] Shaurya Agarwal, Pushkin Kachroo, and Emma Regentova. A hybrid model using logistic regression and wavelettransformation to detect trafﬁc incidents.

IATSS Research , 40(1):56–63, July 2016.[14] Dick Apronti, Khaled Ksaibati, Kenneth Gerow, and Jaime Jo Hepner. Estimating trafﬁc volume on wyominglow volume roads using linear and logistic regression methods.

Journal of Trafﬁc and Transportation Engineering(English Edition) , 3(6):493–506, December 2016.[15] Pinlong Cai, Yunpeng Wang, Guangquan Lu, Peng Chen, Chuan Ding, and Jianping Sun. A spatiotemporalcorrelative k-nearest neighbor model for short-term trafﬁc multistep forecasting.

Transportation Research Part C:Emerging Technologies , 62:21–34, January 2016.[16] A. Sharma, R. Vijay, G. L. Bodhe, and L. G. Malik. An adaptive neuro-fuzzy interface system model for trafﬁcclassiﬁcation and noise prediction.

Soft Computing , 22(6):1891–1902, November 2016.[17] Jianhua Guo, Zhao Liu, Wei Huang, Yun Wei, and Jinde Cao. Short-term trafﬁc ﬂow prediction using fuzzyinformation granulation approach under different time intervals.

IET Intelligent Transport Systems , 12(2):143–150,March 2018.[18] Weihong Chen, Jiyao An, Renfa Li, Li Fu, Guoqi Xie, Md Zakirul Alam Bhuiyan, and Keqin Li. A novel fuzzydeep-learning approach to trafﬁc ﬂow prediction with uncertain spatial–temporal data features.

Future GenerationComputer Systems , 89:78–88, December 2018.[19] Carl Goves, Robin North, Ryan Johnston, and Graham Fletcher. Short term trafﬁc prediction on the UK motorwaynetwork using neural networks.

Transportation Research Procedia , 13:184–195, 2016.[20] Jithin Raj, Hareesh Bahuleyan, and Lelitha Devi Vanajakshi. Application of data mining techniques for trafﬁcdensity estimation and prediction.

Transportation Research Procedia , 17:321–330, 2016.[21] Kui-Lin Li, Chun-Jie Zhai, and Jian-Min Xu. Short-term trafﬁc ﬂow prediction using a methodology based onARIMA and RBF-ANN. In . IEEE, October 2017.17hort-Term Trafﬁc Flow Prediction Using Variational LSTM Networks

A P

REPRINT [22] Bharti Sharma, Sachin Kumar, Prayag Tiwari, Pranay Yadav, and Marina I. Nezhurina. ANN based short-termtrafﬁc ﬂow forecasting in undivided two lane highway.

Journal of Big Data , 5(1), December 2018.[23] Jingyuan Wang, Yukun Cao, Ye Du, and Li Li. DST: A deep urban trafﬁc ﬂow prediction framework basedon spatial-temporal features. In

Knowledge Science, Engineering and Management , pages 417–427. SpringerInternational Publishing, 2019.[24] Anyu Cheng, Xiao Jiang, Yongfu Li, Chao Zhang, and Hao Zhu. Multiple sources and multiple measuresbased trafﬁc ﬂow prediction using the chaos theory and support vector regression method.

Physica A: StatisticalMechanics and its Applications , 466:422–434, January 2017.[25] Yuxing Sun, Biao Leng, and Wei Guan. A novel wavelet-SVM short-time passenger ﬂow prediction in beijingsubway system.

Neurocomputing , 166:109–121, October 2015.[26] Jianli Xiao, Chao Wei, and Yuncai Liu. Speed estimation of trafﬁc ﬂow using multiple kernel support vectorregression.

Physica A: Statistical Mechanics and its Applications , 509:989–997, November 2018.[27] Nicholas G. Polson and Vadim O. Sokolov. Deep learning for short-term trafﬁc ﬂow prediction.

TransportationResearch Part C: Emerging Technologies , 79:1–17, June 2017.[28] Yuankai Wu, Huachun Tan, Lingqiao Qin, Bin Ran, and Zhuxi Jiang. A hybrid deep learning based trafﬁc ﬂowprediction method and its understanding.

Transportation Research Part C: Emerging Technologies , 90:166–180,May 2018.[29] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue Wang. Trafﬁc ﬂow prediction with big data: Adeep learning approach.

IEEE Transactions on Intelligent Transportation Systems , pages 1–9, 2014.[30] Rui Fu, Zuo Zhang, and Li Li. Using LSTM and GRU neural network methods for trafﬁc ﬂow prediction. In . IEEE, November 2016.[31] Bailin Yang, Shulin Sun, Jianyuan Li, Xianxuan Lin, and Yan Tian. Trafﬁc ﬂow prediction using LSTM withfeature enhancement.

Neurocomputing , 332:320–327, March 2019.[32] Yan Tian, Kaili Zhang, Jianyuan Li, Xianxuan Lin, and Bailin Yang. LSTM-based trafﬁc ﬂow prediction withmissing data.

Neurocomputing , 318:297–305, November 2018.[33] Kang Wang, Kenli Li, Liqian Zhou, Yikun Hu, Zhongyao Cheng, Jing Liu, and Cen Chen. Multiple convolutionalneural networks for multivariate time series prediction.

Neurocomputing , May 2019.[34] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.

Neural Computation , 9(8):1735–1780,November 1997.[35] Jürgen Schmidhuber. Deep learning in neural networks: An overview.

Neural networks : the ofﬁcial journal ofthe International Neural Network Society , 61:85–117, 2015.[36] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.