Frontiers in Neurorobotics | 2019

Neuromorphic Vision Datasets for Pedestrian Detection, Action Recognition, and Fall Detection

 
 
 
 
 
 
 

Abstract


Large-scale public datasets are vital for algorithm development in the computer vision field. Thanks to the availability of advanced sensors such as cameras, Lidar and Kinect, massive well-designed datasets created by researchers are free to the scientific and academic world. ImageNet (Deng et al., 2009) is one of the most representative examples which is widely used for image recognition tasks in computer vision. UCF 101 (Soomro et al., 2012) is another large-scale dataset used for human action recognition. However, both of the above datasets provide only the appearance information of objects in the scene. With the limited information provided by RGB images, it is extremely difficult to solve certain problems such as the partition of the foreground and background which have similar colors and textures. With the release of the low-cost Kinect sensor in 2010, acquisition of RGB and depth data became cheaper and easier. Not surprisingly, increasing RGB-D datasets, recorded by the Kinect sensor and dedicated to a wide range of applications, have become available (Cai et al., 2017). We see the same trend, the KITTI dataset (Geiger et al., 2013), starting to occur in the autonomous driving community due to the availability of the Velodyne HDL-64E rotating 3D laser scanner. It is clear that the advent of new sensors always brings opportunities for new dataset development. In this data report, we introduce three new neuromorphic vision datasets recorded by a novel neuromorphic vision sensor named Dynamic Vision Sensors (DVS) (Lichtsteiner et al., 2008). \n \nDVS is a novel type of neuromorphic-based vision sensor, developed by Lichtsteiner et al. (2008). The sensor records event streams as a sequence of tuples [t, x, y, p], where t is the timestamp of the event, (x, y) is the pixel coordinates of the event in 2D space and p is the polarity of the event indicating the brightness change. Compared to the conventional frame-based cameras, neuromorphic vision sensors are frameless which take a radically different approach, doing away with images completely. It properly addresses the universal drawbacks of conventional frame-based cameras, such as data redundancy, high latency and low temporal resolution in a fresh new paradigm. This sensor has matured to the point of entering commercial market only in the last decade. As a much younger field, one of the main challenges faced is the lack of neuromorphic vision datasets impeding the progress of the field. We can thus learn from the rapid development and maturation of computer vision. \n \nIt cannot be doubted that neuromorphic vision research will benefit from new datasets similar to those of computer vision. However, the unique difficulty in the datasets arises because neuromorphic vision data differs significantly from conventional camera data and no direct method for converting between two data formats exists. To address this, we introduce the largest neuromorphic vision datasets targeting the three human motion related tasks: pedestrian detection, human action recognition and human fall detection. We hope that those datasets will meet the significant demands of the neuromorphic vision, computer vision and robotic communities. More specifically, the open access of three datasets should stimulate the development of algorithms processing the event-based asynchronous stream input. In addition, to allow for a fair comparison with frame-based computer vision, we also introduce three encoding methods which are used to convert the spatio-temporal data format to conventional frames. \n \nPreviously, several datasets of neuromorphic vision sensors addressing the problem of detection and classification were proposed (Orchard et al., 2015; Serrano-Gotarredona and Linares-Barranco, 2015; Hu et al., 2016; Liu et al., 2016; Li et al., 2017). Many of them were recorded with a static DVS facing a monitor on which computer vision datasets were set to play automatically (Serrano-Gotarredona and Linares-Barranco, 2015; Hu et al., 2016). Thus, the intrinsic temporal information of moving objects between two frames are lost. It is gratifying that there are several high-quality datasets recorded in a real environment in recent years (Moeys et al., 2016; Zhu et al., 2018). Other pioneering works from iniLabs1 and the RPG group2. DDD17 dataset (Binas et al., 2017) is the first annotated driving dataset for event-format data. End-to-end prediction for the steering angle of a vehicle can be achieved with a convolutional neural network. Dataset for Pose Estimation, Visual Odometry, and SLAM is published by Mueggler et al. (2017b). \n \nIt is noteworthy that although there are many public datasets released by the neuromorphic vision community3, open-access datasets for human motion analysis are still lacking. Therefore, we aim to fill this gap and to introduce three datasets in this report: the pedestrian detection dataset, the action recognition dataset and the fall detection dataset. A DAVIS346redColor sensor4 is used for recording. Alongside the datasets, this report presents three encoding methods considering the frequency of the event (Chen, 2018), the surface of active events (Mueggler et al., 2017a) and the Leaky Integrate and Fire (LIF) neuro model (Burkitt, 2006), respectively. We conclude this report with the recording details and summaries of the datasets and encoding methods.

Volume 13
Pages None
DOI 10.3389/fnbot.2019.00038
Language English
Journal Frontiers in Neurorobotics

Full Text