2021 International Conference on Computer Communications and Networks (ICCCN) | 2021

Contrastive Self-Supervised Representation Learning for Sensing Signals from the Time-Frequency Perspective

 
 
 
 
 
 

Abstract


This paper presents a contrastive self-supervised representation learning framework that is new in being designed specifically for deep learning from frequency domain data. Contrastive self-supervised representation learning trains neural networks using mostly unlabeled data. It is motivated by the need to reduce the labeling burden of deep learning. In this paper, we are specifically interested in applying this approach to physical sensing scenarios, such as those arising in Internet-of-Things (IoT) applications. Deep neural networks have been widely utilized in IoT applications, but the performance of such models largely depends on the availability of large labeled datasets, which in turn entails significant training costs. Motivated by the success of contrastive self-supervised representation learning at substantially reducing the need for labeled data (mostly in areas of computer vision and natural language processing), there is growing interest in customizing the contrastive learning framework to IoT applications. Most existing work in that space approaches the problem from a time-domain perspective. However, IoT applications often measure physical phenomena, where the underlying processes (such as acceleration, vibration, or wireless signal propagation) are fundamentally a function of signal frequencies and thus have sparser and more compact representations in the frequency domain. Recently, this observation motivated the development of Short-Time Fourier Neural Networks (STFNets) that learn directly in the frequency domain, and were shown to offer large performance gains compared to Convolutional Neural Networks (CNNs) when designing supervised learning models for IoT tasks. Hence, in this paper, we introduce an STFNet-based Contrastive Self-supervised representation Learning framework (STF-CSL). STF-CSL takes both time-domain and frequency-domain features into consideration. We build the encoder using STFNet as the fundamental building block. We also apply both time-domain data augmentation and frequency-domain data augmentation during the self-supervised training process. We evaluate the resulting performance of STF-CSL on various human activity recognition tasks. The evaluation results demonstrate that STF-CSL significantly outperforms the time-domain based self-supervised approaches thereby substantially enhancing our ability to train deep neural networks from unlabeled data in IoT contexts.

Volume None
Pages 1-10
DOI 10.1109/ICCCN52240.2021.9522151
Language English
Journal 2021 International Conference on Computer Communications and Networks (ICCCN)

Full Text