2021 Grace Hopper Celebration India (GHCI) | 2021

Music genre classification using multi-modal deep learning based fusion

 
 

Abstract


Music genre classification is extensively used in almost all music streaming applications and websites. Most of them use it either to recommend playlists to their customers (such as Spotify, Soundcloud) or simply as a product (e.g. Shazam and MusixMatch). In this paper, we present a novel approach to classify a given song by encoding both textual and music features. The contribution of this work is twofold, i) We propose a multi modal fusion network approach which enables music genre classification utilizing both the textual features (lyrics) and musical features (mel spectrogram) achieving an accuracy of 90.4%. ii) We also propose a multiframe convolutional recurrent neural network (CRNN) based classifier that uses K-nearest neighbor approach over the predictions of every frame to predict the genre of a given song. In multi-modal fusion approach, we utilize co-attention between the textual and musical features for training classification network. The advantage of CRNN based multi frame approach is that it not only enriches the classification process but also enables to generate more training data from a smaller number of music files and thus helps in data augmentation. Our models and code are available on https://github.com/laishawadhwa/Multi-modal-music-genre-classification.

Volume None
Pages 1-5
DOI 10.1109/GHCI50508.2021.9514020
Language English
Journal 2021 Grace Hopper Celebration India (GHCI)

Full Text