2019 IEEE Aerospace Conference | 2019

Unsupervised Upstream Fusion of Multiple Sensing Modalities Using Dynamic Deep Directional-Unit Networks for Event Behavior Characterization

 
 
 
 
 
 
 

Abstract


The increasing availability of many sensing modalities (imagery, radar, radio frequency (RF) signals, acoustical, and seismic data)reporting on the same phenomena introduces new data exploitation opportunities. This also creates a need for fusing multiple modalities in order to take advantage of inter-modal dependencies and phenomenology, since it is rare that a single modality provides complete knowledge of the phenomena of interest. In turn, this raise challenges beyond those related to exploiting each modality separately. Traditional approaches centered on a cascade of signal processing tasks to detect elements of interest (EOIs) within locations/regions of interest (ROIs), followed by temporal tracking and supervised classification of these EOIs over a sequence of observations, is not able to optimally exploit inter-modal characteristics (e.g., spatio-temporal features that co-vary across modalities)of EOI signatures. This paper presents an end-to-end spatiotemporal processing pipeline that uses a novel application of dynamic deep generative neural networks for fusing ‘raw’ and / or feature-level multi-modal and multi-sensor data. This pipeline exploits the learned joint features to perform detection, tracking, and classification of multiple EOI event signatures. Our deep generative learning framework is composed of Conditional Multimodal Deep Directional-unit Networks that extend deep generative network models to enable a general equivariance learning framework with vector-valued visible and hidden units called directional units (DUs). These DUs explicitly represent sensing state (sensing / not sensing) for each modality and environmental context measurements. Direction within a DU indicates whether a feature (within the feature space) is present and the magnitude measures how strongly that feature is present. In this manner, DUs concisely represent a space of features. Furthermore, we introduce a dynamic temporal component to encoding the visible and hidden layers. This component facilitates spatiotemporal multimodal learning tasks including multimodal fusion, cross-modality learning, and shared representation learning, as well as detection, tracking, and classification of multiple known and unknown EOI classes in an unsupervised and/or semi-supervised way. This approach overcomes the inadequacy of pre-defined features as a means for creating efficient, discriminating, low-dimensional representations from high-dimensional multi-modality sensor data collected under difficult, dynamic sensing conditions. This paper presents results that demonstrate our approach enables accurate, realtime target detection, tracking, and recognition of known and unknown moving or stationary targets or events and their activities evolving over space and time.

Volume None
Pages 1-7
DOI 10.1109/AERO.2019.8742221
Language English
Journal 2019 IEEE Aerospace Conference

Full Text