Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joan Serrà is active.

Publication


Featured researches published by Joan Serrà.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Joan Serrà; Emilia Gómez; Perfecto Herrera; Xavier Serra

We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the music information retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This paper first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or dynamic time warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.


New Journal of Physics | 2009

Cross recurrence quantification for cover song identification

Joan Serrà; Xavier Serra; Ralph G. Andrzejak

There is growing evidence that nonlinear time series analysis techniques can be used to successfully characterize, classify, or process signals derived from real-world dynamics even though these are not necessarily deterministic and stationary. In the present study, we proceed in this direction by addressing an important problem our modern society is facing, the automatic classification of digital information. In particular, we address the automatic identification of cover songs, i.e. alternative renditions of a previously recorded musical piece. For this purpose, we here propose a recurrence quantification analysis measure that allows the tracking of potentially curved and disrupted traces in cross recurrence plots (CRPs). We apply this measure to CRPs constructed from the state space representation of musical descriptor time series extracted from the raw audio signal. We show that our method identifies cover songs with a higher accuracy as compared to previously published techniques. Beyond the particular application proposed here, we discuss how our approach can be useful for the characterization of a variety of signals from different scientific disciplines. We study coupled Rossler dynamics with stochastically modulated mean frequencies as one concrete example to illustrate this point.


Advances in Music Information Retrieval | 2010

Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond

Joan Serrà; Emilia Gómez; Perfecto Herrera

A cover version is an alternative rendition of a previously recorded song. Given that a cover may differ from the original song in timbre, tempo, structure, key, arrangement, or language of the vocals, automatically identifying cover songs in a given music collection is a rather difficult task. The music information retrieval (MIR) community has paid much attention to this task in recent years and many approaches have been proposed. This chapter comprehensively summarizes the work done in cover song identification while encompassing the background related to this area of research. The most promising strategies are reviewed and qualitatively compared under a common framework, and their evaluation methodologies are critically assessed. A discussion on the remaining open issues and future lines of research closes the chapter.


IEEE Transactions on Multimedia | 2014

Unsupervised music structure annotation by time series structure features and segment similarity

Joan Serrà; Meinard Müller; Peter Grosche; Josep Lluis Arcos

Automatically inferring the structural properties of raw multimedia documents is essential in todays digitized society. Given its hierarchical and multi-faceted organization, musical pieces represent a challenge for current computational systems. In this article, we present a novel approach to music structure annotation based on the combination of structure features with time series similarity. Structure features encapsulate both local and global properties of a time series, and allow us to detect boundaries between homogeneous, novel, or repeated segments. Time series similarity is used to identify equivalent segments, corresponding to musically meaningful parts. Extensive tests with a total of five benchmark music collections and seven different human annotations show that the proposed approach is robust to different ground truth choices and parameter settings. Moreover, we see that it outperforms previous approaches evaluated under the same framework.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Predictability of Music Descriptor Time Series and its Application to Cover Song Detection

Joan Serrà; Holger Kantz; Xavier Serra; Ralph G. Andrzejak

Intuitively, music has both predictable and unpredictable components. In this paper, we assess this qualitative statement in a quantitative way using common time series models fitted to state-of-the-art music descriptors. These descriptors cover different musical facets and are extracted from a large collection of real audio recordings comprising a variety of musical genres. Our findings show that music descriptor time series exhibit a certain predictability not only for short time intervals, but also for mid-term and relatively long intervals. This fact is observed independently of the descriptor, musical facet and time series model we consider. Moreover, we show that our findings are not only of theoretical relevance but can also have practical impact. To this end we demonstrate that music predictability at relatively long time intervals can be exploited in a real-world application, namely the automatic identification of cover songs (i.e., different renditions or versions of the same musical piece). Importantly, this prediction strategy yields a parameter-free approach for cover song identification that is substantially faster, allows for reduced computational storage and still maintains highly competitive accuracies when compared to state-of-the-art systems.


Multimedia Tools and Applications | 2010

Indexing music by mood: design and integration of an automatic content-based annotator

Cyril Laurier; Owen Meyers; Joan Serrà; Martin Blech; Perfecto Herrera; Xavier Serra

In the context of content analysis for indexing and retrieval, a method for creating automatic music mood annotation is presented. The method is based on results from psychological studies and framed into a supervised learning approach using musical features automatically extracted from the raw audio signal. We present here some of the most relevant audio features to solve this problem. A ground truth, used for training, is created using both social network information systems (wisdom of crowds) and individual experts (wisdom of the few). At the experimental level, we evaluate our approach on a database of 1,000 songs. Tests of different classification methods, configurations and optimizations have been conducted, showing that Support Vector Machines perform best for the task at hand. Moreover, we evaluate the algorithm robustness against different audio compression schemes. This fact, often neglected, is fundamental to build a system that is usable in real conditions. In addition, the integration of a fast and scalable version of this technique with the European Project PHAROS is discussed. This real world application demonstrates the usability of this tool to annotate large-scale databases. We also report on a user evaluation in the context of the PHAROS search engine, asking people about the utility, interest and innovation of this technology in real world use cases.


international symposium on multimedia | 2009

From Low-Level to High-Level: Comparative Study of Music Similarity Measures

Dmitry Bogdanov; Joan Serrà; Nicolas Wack; Perfecto Herrera

Studying the ways to recommend music to a user is a central task within the music information research community. From a content-based point of view, this task can be regarded as obtaining a suitable distance measurement between songs defined on a certain feature space. We propose two such distance measures. First, a low-level measure based on tempo-related aspects, and second, a high-level semantic measure based on regression by support vector machines of different groups of musical dimensions such as genre and culture, moods and instruments, or rhythm and tempo. We evaluate these distance measures against a number of state-of-the-art measures objectively, based on 17 ground truth musical collections, and subjectively, based on 12 listeners’ ratings. Results show that, in spite of being conceptually different, the proposed methods achieve comparable or even higher performance than the considered baseline approaches. Furthermore, they open up the possibility to explore distance metrics that are based on truly semantic notions.


PLOS ONE | 2012

Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

Martín Haro; Joan Serrà; Perfecto Herrera; Alvaro Corral

Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.


Knowledge Based Systems | 2016

Particle swarm optimization for time series motif discovery

Joan Serrà; Josep Lluis Arcos

We consider the task of finding repeated segments or motifs in time series.We propose a new standpoint to the task: formulating it as an optimization problem.We apply particle swarm optimization to solve the problem.The proposed solution finds comparable motifs in substantially less time.The proposed standpoint brings in an unprecedented degree of flexibility to the task. Efficiently finding similar segments or motifs in time series data is a fundamental task that, due to the ubiquity of these data, is present in a wide range of domains and situations. Because of this, countless solutions have been devised but, to date, none of them seems to be fully satisfactory and flexible. In this article, we propose an innovative standpoint and present a solution coming from it: an anytime multimodal optimization algorithm for time series motif discovery based on particle swarms. By considering data from a variety of domains, we show that this solution is extremely competitive when compared to the state-of-the-art, obtaining comparable motifs in considerably less time using minimal memory. In addition, we show that it is robust to different implementation choices and see that it offers an unprecedented degree of flexibility with regard to the task. All these qualities make the presented solution stand out as one of the most prominent candidates for motif discovery in long time series streams. Besides, we believe the proposed standpoint can be exploited in further time series analysis and mining tasks, widening the scope of research and potentially yielding novel effective solutions.


Journal of New Music Research | 2014

Intonation Analysis of Rāgas in Carnatic Music

Gopala Krishna Koduri; Vignesh Ishwar; Joan Serrà; Xavier Serra

Abstract Intonation is a fundamental music concept that has a special relevance in Indian art music. It is characteristic of a rga and key to the musical expression of the artist. Describing intonation is of importance to several music information retrieval tasks such as developing similarity measures based on rgas and artists. In this paper, we first assess rga intonation qualitatively by analysing varṇaṁs, a particular form of Carnatic music compositions. We then approach the task of automatically obtaining a compact representation of the intonation of a recording from its pitch track. We propose two approaches based on the parametrization of pitch-value distributions: performance pitch histograms, and context-based svara distributions obtained by categorizing pitch contours based on the melodic context. We evaluate both approaches on a large Carnatic music collection and discuss their merits and limitations. We finally go through different kinds of contextual information that can be obtained to further improve the two approaches.

Collaboration


Dive into the Joan Serrà's collaboration.

Top Co-Authors

Avatar

Xavier Serra

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Josep Lluis Arcos

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martín Haro

Pompeu Fabra University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge