Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zafar Rafii is active.

Publication


Featured researches published by Zafar Rafii.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

Zafar Rafii; Bryan Pardo

Repetition is a core principle in music. Many musical pieces are characterized by an underlying repeating structure over which varying elements are superimposed. This is especially true for pop songs where a singer often overlays varying vocals on a repeating accompaniment. On this basis, we present the REpeating Pattern Extraction Technique (REPET), a novel and simple approach for separating the repeating “background” from the non-repeating “foreground” in a mixture. The basic idea is to identify the periodically repeating segments in the audio, compare them to a repeating segment model derived from them, and extract the repeating patterns via time-frequency masking. Experiments on data sets of 1,000 song clips and 14 full-track real-world songs showed that this method can be successfully applied for music/voice separation, competing with two recent state-of-the-art approaches. Further experiments showed that REPET can also be used as a preprocessor to pitch detection algorithms to improve melody extraction.


IEEE Transactions on Signal Processing | 2014

Kernel Additive Models for Source Separation

Antoine Liutkus; Derry Fitzgerald; Zafar Rafii; Bryan Pardo; Laurent Daudet

Source separation consists of separating a signal into additive components. It is a topic of considerable interest with many applications that has gathered much attention recently. Here, we introduce a new framework for source separation called Kernel Additive Modelling, which is based on local regression and permits efficient separation of multidimensional and/or nonnegative and/or non-regularly sampled signals. The main idea of the method is to assume that a source at some location can be estimated using its values at other locations nearby, where nearness is defined through a source-specific proximity kernel. Such a kernel provides an efficient way to account for features like periodicity, continuity, smoothness, stability over time or frequency, and self-similarity. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to separate them using the iterative kernel backfitting algorithm we describe. As we show, kernel additive modelling generalizes many recent and efficient techniques for source separation and opens the path to creating and combining source models in a principled way. Experimental results on the separation of synthetic and audio signals demonstrate the effectiveness of the approach.


international conference on acoustics, speech, and signal processing | 2012

Adaptive filtering for music/voice separation exploiting the repeating musical structure

Antoine Liutkus; Zafar Rafii; Roland Badeau; Bryan Pardo; Gaël Richard

The separation of the lead vocals from the background accompaniment in audio recordings is a challenging task. Recently, an efficient method called REPET (REpeating Pattern Extraction Technique) has been proposed to extract the repeating background from the non-repeating foreground. While effective on individual sections of a song, REPET does not allow for variations in the background (e.g. verse vs. chorus), and is thus limited to short excerpts only. We overcome this limitation and generalize REPET to permit the processing of complete musical tracks. The proposed algorithm tracks the period of the repeating structure and computes local estimates of the background pattern. Separation is performed by soft time-frequency masking, based on the deviation between the current observation and the estimated background pattern. Evaluation on a dataset of 14 complete tracks shows that this method can perform at least as well as a recent competitive music/voice separation method, while being computationally efficient.


international conference on acoustics, speech, and signal processing | 2011

A simple music/voice separation method based on the extraction of the repeating musical structure

Zafar Rafii; Bryan Pardo

Repetition is a core principle in music. This is especially true for popular songs, generally marked by a noticeable repeating musical structure, over which the singer performs varying lyrics. On this basis, we propose a simple method for separating music and voice, by extraction of the repeating musical structure. First, the period of the repeating structure is found. Then, the spectrogram is segmented at period boundaries and the segments are averaged to create a repeating segment model. Finally, each time-frequency bin in a segment is compared to the model, and the mixture is partitioned using binary time-frequency masking by labeling bins similar to the model as the repeating background. Evaluation on a dataset of 1,000 song clips showed that this method can improve on the performance of an existing music/voice separation method without requiring particular features or complex frameworks.


international conference on acoustics, speech, and signal processing | 2015

Scalable audio separation with light Kernel Additive Modelling

Antoine Liutkus; Derry Fitzgerald; Zafar Rafii

Recently, Kernel Additive Modelling (KAM) was proposed as a unified framework to achieve multichannel audio source separation. Its main feature is to use kernel models for locally describing the spectrograms of the sources. Such kernels can capture source features such as repetitivity, stability over time and/or frequency, self-similarity, etc. KAM notably subsumes many popular and effective methods from the state of the art, including REPET and harmonic/percussive separation with median filters. However, it also comes with an important drawback in its initial form: its memory usage badly scales with the number of sources. Indeed, KAM requires the storage of the full-resolution spectrogram for each source, which may become prohibitive for full-length tracks or many sources. In this paper, we show how it can be combined with a fast compression algorithm of its parameters to address the scalability issue, thus enabling its use on small platforms or mobile devices.


international conference on acoustics, speech, and signal processing | 2013

Online REPET-SIM for real-time speech enhancement

Zafar Rafii; Bryan Pardo

REPET-SIM is a generalization of the REpeating Pattern Extraction Technique (REPET) that uses a similarity matrix to separate the repeating background from the non-repeating foreground in a mixture. The method assumes that the background (typically the music accompaniment) is dense and low-ranked, while the foreground (typically the singing voice) is sparse and varied. While this assumption is often true for background music and foreground voice in musical mixtures, it also often holds for background noise and foreground speech in noisy mixtures. We therefore propose here to extend REPET-SIM for noise/speech segregation. In particular, given the low computational complexity of the algorithm, we show that the method can be easily implemented online for real-time processing. Evaluation on a data set of 10 stereo two-channel mixtures of speech and real-world background noise showed that this online REPET-SIM can be successfully applied for real-time speech enhancement, performing as well as different competitive methods.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Combining rhythm-based and pitch-based methods for background and melody separation

Zafar Rafii; Zhiyao Duan; Bryan Pardo

Musical works are often composed of two characteristic components: the background (typically the musical accompaniment), which generally exhibits a strong rhythmic structure with distinctive repeating time elements, and the melody (typically the singing voice or a solo instrument), which generally exhibits a strong harmonic structure with a distinctive predominant pitch contour. Drawing from findings in cognitive psychology, we propose to investigate the simple combination of two dedicated approaches for separating those two components: a rhythm-based method that focuses on extracting the background via a rhythmic mask derived from identifying the repeating time elements in the mixture and a pitch-based method that focuses on extracting the melody via a harmonic mask derived from identifying the predominant pitch contour in the mixture. Evaluation on a data set of song clips showed that combining such two contrasting yet complementary methods can help to improve separation performance-from the point of view of both components-compared with using only one of those methods, and also compared with two other state-of-the-art approaches.


2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays, HSCMA 2014 | 2014

Kernel spectrogram models for source separation

Antoine Liutkus; Zafar Rafii; Bryan Pardo; Derry Fitzgerald; Laurent Daudet

In this study, we introduce a new framework called Kernel Additive Modelling for audio spectrograms that can be used for multichannel source separation. It assumes that the spectrogram of a source at any time-frequency bin is close to its value in a neighbourhood indicated by a source-specific proximity kernel. The rationale for this model is to easily account for features like periodicity, stability over time or frequency, self-similarity, etc. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to estimate them blindly using their mixtures only. Estimation is performed using a variant of the kernel backfitting algorithm that allows for multichannel mixtures and permits parallelization. Experimental results on the separation of vocals from musical backgrounds demonstrate the efficiency of the approach.


international conference on acoustics, speech, and signal processing | 2014

An audio fingerprinting system for live version identification using image processing techniques

Zafar Rafii; Bob Coover; Jinyu Han

Suppose that you are at a music festival checking on an artist, and you would like to quickly know about the song that is being played (e.g., title, lyrics, album, etc.). If you have a smartphone, you could record a sample of the live performance and compare it against a database of existing recordings from the artist. Services such as Shazam or SoundHound will not work here, as this is not the typical framework for audio fingerprinting or query-by-humming systems, as a live performance is neither identical to its studio version (e.g., variations in instrumentation, key, tempo, etc.) nor it is a hummed or sung melody. We propose an audio fingerprinting system that can deal with live version identification by using image processing techniques. Compact fingerprints are derived using a log-frequency spectrogram and an adaptive thresholding method, and template matching is performed using the Hamming similarity and the Hough Transform.


international conference on acoustics, speech, and signal processing | 2015

A simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds

Zafar Rafii; Antoine Liutkus; Bryan Pardo

Repetition is a fundamental element in generating and perceiving structure in audio. Especially in music, structures tend to be composed of patterns that repeat through time (e.g., rhythmic elements in a musical accompaniment), and also frequency (e.g., different notes of the same instrument). The auditory system has the remarkable ability to parse such patterns by identifying repetitions within the audio mixture. On this basis, we propose a simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds. A user selects a region in the log-frequency spectrogram of an audio recording from which she/he wishes to recover a repeating pattern masked by an undesired element (e.g., a note masked by a cough). The selected region is then cross-correlated with the spectrogram to identify similar regions where the underlying pattern repeats. The identified regions are finally averaged over their repetitions and the repeating pattern is recovered.

Collaboration


Dive into the Zafar Rafii's collaboration.

Top Co-Authors

Avatar

Bryan Pardo

Northwestern University

View shared research outputs
Top Co-Authors

Avatar

Derry Fitzgerald

Cork Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Zhiyao Duan

University of Rochester

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge