Anders Elowsson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anders Elowsson is active.

Explore More

Publication

Featured researches published by Anders Elowsson.

Journal of the Acoustical Society of America | 2015

Modeling the perception of tempo

Anders Elowsson; Anders Friberg

A system is proposed in which rhythmic representations are used to model the perception of tempo in music. The system can be understood as a five-layered model, where representations are transformed into higher-level abstractions in each layer. First, source separation is applied (Audio Level), onsets are detected (Onset Level), and interonset relationships are analyzed (Interonset Level). Then, several high-level representations of rhythm are computed (Rhythm Level). The periodicity of the music is modeled by the cepstroid vector-the periodicity of an interonset interval (IOI)-histogram. The pulse strength for plausible beat length candidates is defined by computing the magnitudes in different IOI histograms. The speed of the music is modeled as a continuous function on the basis of the idea that such a function corresponds to the underlying perceptual phenomena, and it seems to effectively reduce octave errors. By combining the rhythmic representations in a logistic regression framework, the tempo of the music is finally computed (Tempo Level). The results are the highest reported in a formal benchmarking test (2006-2013), with a P-Score of 0.857. Furthermore, the highest results so far are reported for two widely adopted test sets, with an Acc1 of 77.3% and 93.0% for the Songs and Ballroom datasets.

Journal of the Acoustical Society of America | 2017

Predicting the perception of performed dynamics in music audio with ensemble learning

Anders Elowsson; Anders Friberg

By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

Journal of the Acoustical Society of America | 2018

Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields

Anders Friberg; Tony Lindeberg; Martin Hellwagner; Pétur Helgason; Gláucia Laís Salomão; Anders Elowsson; Guillaume Lemaitre; Sten Ternström

Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence, have been modeled from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8% for phonation, 90.8% for supraglottal myoelastic vibrations, and 89.0% for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.

Journal of the Acoustical Society of America | 2014

Using listener-based perceptual features as intermediate representations in music information retrieval

Anders Friberg; Erwin Schoonderwaldt; Anton Hedblad; Marco Fabiani; Anders Elowsson

international symposium/conference on music information retrieval | 2013

Modelling the Speed of Music using Features from Harmonic/Percussive Separated Audio.

Anders Elowsson; Anders Friberg; Guy Madison; Johan Paulin

SMC Sound and Music Computing Conference 2013; Stockholm, Sweden, 30 July-3 August, 2013 | 2013

Modelling Perception of Speed in Music Audio

Anders Elowsson; Anders Friberg

international symposium/conference on music information retrieval | 2016

Beat Tracking with a Cepstroid Invariant Neural Network

Anders Elowsson

arXiv: Information Retrieval | 2014

Using perceptually defined music features in music information retrieval.

Anders Friberg; Erwin Schoonderwaldt; Anton Hedblad; Marco Fabiani; Anders Elowsson

arXiv: Sound | 2018

Polyphonic Pitch Tracking with Deep Layered Learning.

Anders Elowsson

the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music | 2012