2021 25th International Conference on Information Technology (IT) | 2021
Deep learning-based classification of environmental sounds
Abstract
Sound classification has been a major research topic for many years. Sound classifying models and intelligent sound recognition systems are based on the analysis of human auditory characteristics. The use cases of those characteristics are countless: context awareness, surveillance systems, crime detection etc. Deep learning models, such as convolutional neural networks (CNNs), have been shown very useful for classifying image datasets. We approach the sound classification problem via image classification. To that end, we represent sound files with their image representations, namely mel spectrogram, tonal centroid, spectral contrast and chromagram, and train a CNN deep learning model on these image representations. The proposed method achieves the mean accuracy of 73%, using 10-fold cross validation. This result is, considering the nature of the dataset, and the fact that environmental sounds are much more cumbersome to classify compared with music and speech, very satisfactory. The experimental results also show that the deep learning approach outperforms in accuracy fully-connected NN (59% accuracy) by a large margin.