IEEE Transactions on Vehicular Technology | 2021

Using Appearance to Predict Pedestrian Trajectories Through Disparity-Guided Attention and Convolutional LSTM

 
 
 
 
 

Abstract


Reasoning about the motions of other objects is important for moving platforms such as intelligent vehicles. In this paper, we propose a pedestrian trajectory prediction approach that utilizes pedestrian appearance as well as historical locations and camera ego-motion to predict future trajectories of pedestrians for vehicles. The proposed model consists of three encoders and a decoder, namely: appearance encoder, location encoder, and ego-motion encoder; and a Long Short-term Memory (LSTM) decoder. The appearance encoder employs an embedding Convolutional Neural Network (CNN) and a Convolutional LSTM (ConvLSTM) to encode pedestrian appearance. A disparity-guided attention mechanism is designed in the appearance encoder to attend to salient dynamic regions in pedestrian appearance. The location encoder and ego-motion encoder employ LSTMs to encode pedestrian historical locations and camera ego-motion. The outputs of all three encoders are concatenated as a representation of input information, which is decoded by the decoder to finally generate the future trajectories of observed pedestrians. We evaluated the proposed model on two public datasets with several baselines and the results showed improvement of about 20% compared with the baselines. We also visualized the predicted trajectories and heat maps generated by disparity-guided attention. Both the visualization and quantitative results validated that our proposed model outperforms the baselines on predicting accuracy and that the model is feasible to capture salient spatiotemporal regions in pedestrian appearance.

Volume 70
Pages 7480-7494
DOI 10.1109/TVT.2021.3094678
Language English
Journal IEEE Transactions on Vehicular Technology

Full Text