Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Perfecto Herrera is active.

Publication


Featured researches published by Perfecto Herrera.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Joan Serrà; Emilia Gómez; Perfecto Herrera; Xavier Serra

We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the music information retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This paper first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or dynamic time warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.


international conference on machine learning and applications | 2008

Multimodal Music Mood Classification Using Audio and Lyrics

Cyril Laurier; Jens Grivolla; Perfecto Herrera

In this paper we present a study on music mood classification using audio and lyrics information. The mood of a song is expressed by means of musical features but a relevant part also seems to be conveyed by the lyrics. We evaluate each factor independently and explore the possibility to combine both, using natural language processing and music information retrieval techniques. We show that standard distance-based methods and latent semantic analysis are able to classify the lyrics significantly better than random, but the performance is still quite inferior to that of audio-based techniques. We then introduce a method based on differences between language models that gives performances closer to audio-based classifiers. Moreover, integrating this in a multimodal system (audio+text) allows an improvement in the overall performance. We demonstrate that lyrics and audio information are complementary, and can be combined to improve a classification system.


Advances in Music Information Retrieval | 2010

Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond

Joan Serrà; Emilia Gómez; Perfecto Herrera

A cover version is an alternative rendition of a previously recorded song. Given that a cover may differ from the original song in timbre, tempo, structure, key, arrangement, or language of the vocals, automatically identifying cover songs in a given music collection is a rather difficult task. The music information retrieval (MIR) community has paid much attention to this task in recent years and many approaches have been proposed. This chapter comprehensively summarizes the work done in cover song identification while encompassing the background related to this area of research. The most promising strategies are reviewed and qualitatively compared under a common framework, and their evaluation methodologies are critically assessed. A discussion on the remaining open issues and future lines of research closes the chapter.


international conference on music and artificial intelligence | 2002

Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques

Perfecto Herrera; Alexandre Yeterian; Fabien Gouyon

We present a comparative evaluation of automatic classification of a sound database containing more than six hundred drum sounds (kick, snare, hihat, toms and cymbals). A preliminary set of fifty descriptors has been refined with the help of different techniques and some final reduced sets including around twenty features have been selected as the most relevant. We have then tested different classification techniques (instance-based, statistical-based, and tree-based) using ten-fold cross-validation. Three levels of taxonomic classification have been tested: membranes versus plates (super-category level), kick vs. snare vs. hihat vs. toms vs. cymbals (basic level), and some basic classes (kick and snare) plus some sub-classes -i.e. ride, crash, open-hihat, closed hihat, high-tom, medium-tom, low-tom- (sub-category level). Very high hit-rates have been achieved (99%, 97%, and 90% respectively) with several of the tested techniques.


NeuroImage | 2013

The roles of superficial amygdala and auditory cortex in music-evoked fear and joy

Stefan Koelsch; Stavros Skouras; Thomas Fritz; Perfecto Herrera; Corinna E. Bonhage; Mats B. Küssner; Arthur M. Jacobs

This study investigates neural correlates of music-evoked fear and joy with fMRI. Studies on neural correlates of music-evoked fear are scant, and there are only a few studies on neural correlates of joy in general. Eighteen individuals listened to excerpts of fear-evoking, joy-evoking, as well as neutral music and rated their own emotional state in terms of valence, arousal, fear, and joy. Results show that BOLD signal intensity increased during joy, and decreased during fear (compared to the neutral condition) in bilateral auditory cortex (AC) and bilateral superficial amygdala (SF). In the right primary somatosensory cortex (area 3b) BOLD signals increased during exposure to fear-evoking music. While emotion-specific activity in AC increased with increasing duration of each trial, SF responded phasically in the beginning of the stimulus, and then SF activity declined. Psychophysiological Interaction (PPI) analysis revealed extensive emotion-specific functional connectivity of AC with insula, cingulate cortex, as well as with visual, and parietal attentional structures. These findings show that the auditory cortex functions as a central hub of an affective-attentional network that is more extensive than previously believed. PPI analyses also showed functional connectivity of SF with AC during the joy condition, taken to reflect that SF is sensitive to social signals with positive valence. During fear music, SF showed functional connectivity with visual cortex and area 7 of the superior parietal lobule, taken to reflect increased visual alertness and an involuntary shift of attention during the perception of auditory signals of danger.


Information Processing and Management | 2013

Semantic audio content-based music recommendation and visualization based on user preference examples

Dmitry Bogdanov; Martín Haro; Ferdinand Fuhrmann; Anna Xambó; Emilia Gómez; Perfecto Herrera

Preference elicitation is a challenging fundamental problem when designing recommender systems. In the present work we propose a content-based technique to automatically generate a semantic representation of the users musical preferences directly from audio. Starting from an explicit set of music tracks provided by the user as evidence of his/her preferences, we infer high-level semantic descriptors for each track obtaining a user model. To prove the benefits of our proposal, we present two applications of our technique. In the first one, we consider three approaches to music recommendation, two of them based on a semantic music similarity measure, and one based on a semantic probabilistic model. In the second application, we address the visualization of the users musical preferences by creating a humanoid cartoon-like character - the Musical Avatar - automatically inferred from the semantic representation. We conducted a preliminary evaluation of the proposed technique in the context of these applications with 12 subjects. The results are promising: the recommendations were positively evaluated and close to those coming from state-of-the-art metadata-based systems, and the subjects judged the generated visualizations to capture their core preferences. Finally, we highlight the advantages of the proposed semantic user model for enhancing the user interfaces of information filtering systems.


acm multimedia | 2013

ESSENTIA: an open-source library for sound and music analysis

Dmitry Bogdanov; Nicolas Wack; Emilia Gómez; Sankalp Gulati; Perfecto Herrera; Oscar Mayor; Gerard Roma; Justin Salamon; José R. Zapata; Xavier Serra

We present Essentia 2.0, an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. The library is also wrapped in Python and includes a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. Furthermore, it includes a Vamp plugin to be used with Sonic Visualiser for visualization purposes. The library is cross-platform and currently supports Linux, Mac OS X, and Windows systems. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included in-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications.


international conference on acoustics, speech, and signal processing | 2002

Pulse-dependent analyses of percussive music

Fabien Gouyon; Perfecto Herrera; Pedro Cano

We report on a method of automatic extraction of a metrical attribute from percussive music audio signals: the smallest rhythmic pulse, called the ìtickî. The relevance of use of this feature in the framework of subsequent analyses is discussed and evaluated.Learning Bayesian Belief Networks (BBN) from corpora and incorporating the extracted inferring knowledge with a Support Vector Machines (SVM) classifier has been applied to character segmentation for unconstrained handwritten text. By taking advantage of the plethora in unlabeled data found in image databases in addition to some available labeled examples, we overcome the expensive task of annotating the whole set of training data and the performance of the character segmentation learner is increased. Apart from this approach, which has not previously used for this task, we have experimented with two well-known machine learning methods (Learning Vector Quantization and a simplified version of the Transformation-Based Learning theory). We argue that a classifier generated from BBN and SVM is well suited for learning to identify the correct segment boundaries. Empirical results will support this claim. Performance has been methodically evaluated using both English and Modem Greek corpora in order to determine the unbiased behaviour of the trained models. Limited training data are proved to endow with satisfactory results. We have been able to achieve precision exceeding 86%.


Multimedia Tools and Applications | 2010

Indexing music by mood: design and integration of an automatic content-based annotator

Cyril Laurier; Owen Meyers; Joan Serrà; Martin Blech; Perfecto Herrera; Xavier Serra

In the context of content analysis for indexing and retrieval, a method for creating automatic music mood annotation is presented. The method is based on results from psychological studies and framed into a supervised learning approach using musical features automatically extracted from the raw audio signal. We present here some of the most relevant audio features to solve this problem. A ground truth, used for training, is created using both social network information systems (wisdom of crowds) and individual experts (wisdom of the few). At the experimental level, we evaluate our approach on a database of 1,000 songs. Tests of different classification methods, configurations and optimizations have been conducted, showing that Support Vector Machines perform best for the task at hand. Moreover, we evaluate the algorithm robustness against different audio compression schemes. This fact, often neglected, is fundamental to build a system that is usable in real conditions. In addition, the integration of a fast and scalable version of this technique with the European Project PHAROS is discussed. This real world application demonstrates the usability of this tool to annotate large-scale databases. We also report on a user evaluation in the context of the PHAROS search engine, asking people about the utility, interest and innovation of this technology in real world use cases.


Eurasip Journal on Audio, Speech, and Music Processing | 2010

Ecological acoustics perspective for content-based retrieval of environmental sounds

Gerard Roma; Jordi Janer; Stefan Kersten; Mattia Schirosa; Perfecto Herrera; Xavier Serra

In this paper we present a method to search for environmental sounds in large unstructured databases of user-submitted audio, using a general sound events taxonomy from ecological acoustics. We discuss the use of Support Vector Machines to classify sound recordings according to the taxonomy and describe two use cases for the obtained classification models: a content-based web search interface for a large audio database and a method for segmenting field recordings to assist sound design.

Collaboration


Dive into the Perfecto Herrera's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xavier Serra

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nicolas Wack

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gerard Roma

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar

Òscar Celma

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pedro Cano

Pompeu Fabra University

View shared research outputs
Researchain Logo
Decentralizing Knowledge