International Journal of Speech Technology | 2021

Detecting adversarial attacks on audio-visual speech recognition using deep learning method

 

Abstract


Deep learning techniques have made significant progress in various machine learning-based tasks in different fields. Deep learning patterns are primarily prone to Adverse attacks. However, the exploration of adversarial detection methods for the audio and Video (AV) streaming dataset is minimal. This research proposes an effective malicious detection process with the temporal connection among distinct AV streams using the Deep Convolutional Neural Network (DCNN) method. The proposed process significantly detects the adversarial attacks based on two audio-visual recognition models, namely Lip-Reading in the Wild(LRW) and Geospatial Repository and Data (GRiD) Management models, which are trained in correspondence to the Lip reading data sets. Experimental results have indicated that the proposed strategy is a powerful method to identify the adversarial attacks compared to Supervised Kernel Machines, Combined Neural Network, and Band Feature Selection methods. The precision, recall, accuracy, and F1-score of the proposed system are observed as 88.10%, 89.30%, 95.60%, and 0.96, respectively, far better than the existing systems.

Volume None
Pages 1-7
DOI 10.1007/S10772-021-09859-3
Language English
Journal International Journal of Speech Technology

Full Text