Antti Eronen
Nokia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Antti Eronen.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Anssi Klapuri; Antti Eronen; Jaakko Astola
A method is described which analyzes the basic pattern of beats in a piece of music, the musical meter. The analysis is performed jointly at three different time scales: at the temporally atomic tatum pulse level, at the tactus pulse level which corresponds to the tempo of a piece, and at the musical measure level. Acoustic signals from arbitrary musical genres are considered. For the initial time-frequency analysis, a new technique is proposed which measures the degree of musical accent as a function of time at four different frequency ranges. This is followed by a bank of comb filter resonators which extracts features for estimating the periods and phases of the three pulses. The features are processed by a probabilistic model which represents primitive musical knowledge and uses the low-level observations to perform joint estimation of the tatum, tactus, and measure pulses. The model takes into account the temporal dependencies between successive estimates and enables both causal and noncausal analysis. The method is validated using a manually annotated database of 474 music signals from various genres. The method works robustly for different types of music and improves over two state-of-the-art reference methods in simulations.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Antti Eronen; Vesa T. Peltonen; Juha T. Tuomi; Anssi Klapuri; Seppo Fagerlund; Timo Sorsa; Gaëtan Lorho; Jyri Huopaniemi
The aim of this paper is to investigate the feasibility of an audio-based context recognition system. Here, context recognition refers to the automatic classification of the context or an environment around a device. A system is developed and compared to the accuracy of human listeners in the same task. Particular emphasis is placed on the computational complexity of the methods, since the application is of particular interest in resource-constrained portable devices. Simplistic low-dimensional feature vectors are evaluated against more standard spectral features. Using discriminative training, competitive recognition accuracies are achieved with very low-order hidden Markov models (1-3 Gaussian components). Slight improvement in recognition accuracy is observed when linear data-driven feature transformations are applied to mel-cepstral features. The recognition rate of the system as a function of the test sequence length appears to converge only after about 30 to 60 s. Some degree of accuracy can be achieved even with less than 1-s test sequence lengths. The average reaction time of the human listeners was 14 s, i.e., somewhat smaller, but of the same order as that of the system. The average recognition accuracy of the system was 58% against 69%, obtained in the listening tests in recognizing between 24 everyday contexts. The accuracies in recognizing six high-level classes were 82% for the system and 88% for the subjects.
international conference on acoustics, speech, and signal processing | 2000
Antti Eronen; Anssi Klapuri
In this paper, a system for pitch independent musical instrument recognition is presented. A wide set of features covering both spectral and temporal properties of sounds was investigated, and their extraction algorithms were designed. The usefulness of the features was validated using test data that consisted of 1498 samples covering the full pitch ranges of 30 orchestral instruments from the string, brass and woodwind families, played with different techniques. The correct instrument family was recognized with 94% accuracy and individual instruments in 80% of cases. These results are compared to those reported in other work. Also, utilization of a hierarchical classification framework is considered.
workshop on applications of signal processing to audio and acoustics | 2001
Antti Eronen
Several features were compared with regard to recognition performance in a musical instrument recognition system. Both mel-frequency and linear prediction cepstral and delta cepstral coefficients were calculated. Linear prediction analysis was carried out both on a uniform and a warped frequency scale, and reflection coefficients were also used as features. The performance of earlier described features relating to the temporal development, modulation properties, brightness, and spectral synchronity of sounds was also analysed. The data base consisted of 5286 acoustic and synthetic solo tones from 29 different Western orchestral instruments, out of which 16 instruments were included in the test set. The best performance for solo tone recognition, 35% for individual instruments and 77% for families, was obtained with a feature set consisting of two sets of mel-frequency cepstral coefficients and a subset of the other analysed features. The confusions made by the system were analysed and compared to results reported in a human perception experiment.
Eurasip Journal on Audio, Speech, and Music Processing | 2013
Toni Heittola; Annamaria Mesaros; Antti Eronen; Tuomas Virtanen
AbstractThe work presented in this article studies how the context information can be used in the automatic sound event detection process, and how the detection system can benefit from such information. Humans are using context information to make more accurate predictions about the sound events and ruling out unlikely events given the context. We propose a similar utilization of context information in the automatic sound event detection process. The proposed approach is composed of two stages: automatic context recognition stage and sound event detection stage. Contexts are modeled using Gaussian mixture models and sound events are modeled using three-state left-to-right hidden Markov models. In the first stage, audio context of the tested signal is recognized. Based on the recognized context, a context-specific set of sound event classes is selected for the sound event detection stage. The event detection stage also uses context-dependent acoustic models and count-based event priors. Two alternative event detection approaches are studied. In the first one, a monophonic event sequence is outputted by detecting the most prominent sound event at each time instance using Viterbi decoding. The second approach introduces a new method for producing polyphonic event sequence by detecting multiple overlapping sound events using multiple restricted Viterbi passes. A new metric is introduced to evaluate the sound event detection performance with various level of polyphony. This combines the detection accuracy and coarse time-resolution error into one metric, making the comparison of the performance of detection algorithms simpler. The two-step approach was found to improve the results substantially compared to the context-independent baseline system. In the block-level, the detection accuracy can be almost doubled by using the proposed context-dependent event detection.
information sciences, signal processing and their applications | 2003
Antti Eronen
In this paper, we describe a system for the recognition of musical instruments from isolated notes or drum samples. We first describe a baseline system that uses mel-frequency cepstral coefficients and their first derivatives as features, and continuous-density hidden Markov models (HMMs). Two improvements are proposed to increase the performance of this baseline system. First, transforming the features to a base with maximal statistical independence using independent component analysis can give an improvement of 9 percentage points in recognition accuracy. Secondly, discriminative training is shown to further improve the recognition accuracy of the system. The evaluation material consists of 5895 isolated notes of Western orchestral instruments, and 1798 drum hits.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Antti Eronen; Anssi Klapuri
An approach for tempo estimation from musical pieces with near-constant tempo is proposed. The method consists of three main steps: measuring the degree of musical accent as a function of time, periodicity analysis, and tempo estimation. Novel accent features based on the chroma representation are proposed. The periodicity of the accent signal is measured using the generalized autocorrelation function, followed by tempo estimation using k-Nearest Neighbor regression. We propose a resampling step applied to an unknown periodicity vector before finding the nearest neighbors. This step improves the performance of the method significantly. The tempo estimate is computed as a distance-weighted median of the nearest neighbor tempi. Experimental results show that the proposed method provides significantly better tempo estimation accuracies than three reference methods.
international conference on acoustics, speech, and signal processing | 2003
Antti Eronen; Juha T. Tuomi; Anssi Klapuri; Seppo Fagerlund; Timo Sorsa; Gaëtan Lorho; Jyri Huopaniemi
The paper concerns the development of a system for the recognition of a context or an environment based on acoustic information only. Our system uses Mel-frequency cepstral coefficients and their derivatives as features, and continuous density hidden Markov models (HMM) as acoustic models. We evaluate different model topologies and training methods for HMMs and show that discriminative training can yield a 10% reduction in error rate compared to maximum-likelihood training. A listening test is made to study the human accuracy in the task and to obtain a baseline for the assessment of the performance of the system. Direct comparison to human performance indicates that the system performs somewhat worse than human subjects do in the recognition of 18 everyday contexts and almost comparably in recognizing six higher level categories.
Sensors | 2014
Jussi Parviainen; Jayaprasad Bojja; Jussi Collin; Jussi Leppänen; Antti Eronen
In this paper, an adaptive activity and environment recognition algorithm running on a mobile phone is presented. The algorithm makes inferences based on sensor and radio receiver data provided by the phone. A wide set of features that can be extracted from these data sources were investigated, and a Bayesian maximum a posteriori classifier was used for classifying between several user activities and environments. The accuracy of the method was evaluated on a dataset collected in a real-life trial. In addition, comparison to other state-of-the-art classifiers, namely support vector machines and decision trees, was performed. To make the system adaptive for individual user characteristics, an adaptation algorithm for context model parameters was designed. Moreover, a confidence measure for the classification correctness was designed. The proposed adaptation algorithm and confidence measure were evaluated on a second dataset obtained from another real-life trial, where the users were requested to provide binary feedback on the classification correctness. The results show that the proposed adaptation algorithm is effective at improving the classification accuracy.
mobile and ubiquitous multimedia | 2010
Jukka Holm; Arto Juhani Lehtiniemi; Antti Eronen
This paper studies the idea of using avatars as a user interface for discovering new music. In the evaluated prototype, the user builds an avatar from three parts (head, body and background). The appearance of each part reflects a certain musical genre. Based on the selected combination of parts, the application generates a new playlist of music by seeding a content-based music recommender with examples from the selected genres. In a user study with 40 participants, the prototype was considered to be entertaining and easy to use. The concept inspired users to explore new music and provided faster access to cross-genre playlists than traditional music player applications. In the longer term use, the prototype was slightly too simple and it would have benefited from, e.g., text-based search functionality. Several other interesting ideas for the future development of the concept were also received.