tm - Technisches Messen | 2021

Influence of input data representations for time-dependent instrument recognition

 
 

Abstract


Abstract An important preprocessing step for several music signal processing algorithms is the estimation of playing instruments in music recordings. To this aim, time-dependent instrument recognition is realized by a neural network with residual blocks in this approach. Since music signal processing tasks use diverse time-frequency representations as input matrices, the influence of different input representations for instrument recognition is analyzed in this work. Three-dimensional inputs of short-time Fourier transform (STFT) magnitudes and an additional time-frequency representation based on phase information are investigated as well as two-dimensional STFT or constant-Q transform (CQT) magnitudes. As additional phase representations, the product spectrum (PS), based on the modified group delay, and the frequency error (FE) matrix, related to the instantaneous frequency, are used. Training and evaluation processes are executed based on the MusicNet dataset, which enables the estimation of seven instruments. With a higher number of frequency bins in the input representations, an improved instrument recognition of about 2\u2009% in F1-score can be achieved. Compared to the literature, frame-level instrument recognition can be improved for different input representations.

Volume 88
Pages 274 - 281
DOI 10.1515/teme-2020-0100
Language English
Journal tm - Technisches Messen

Full Text