Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paris Smaragdis is active.

Publication


Featured researches published by Paris Smaragdis.


workshop on applications of signal processing to audio and acoustics | 2003

Non-negative matrix factorization for polyphonic music transcription

Paris Smaragdis; Judith C. Brown

We present a methodology for analyzing polyphonic musical passages comprised of notes that exhibit a harmonically fixed spectral profile (such as piano notes). Taking advantage of this unique note structure, we can model the audio content of the musical passage by a linear basis transform and use non-negative matrix decomposition methods to estimate the spectral profile and the temporal information of every note. This approach results in a very simple and compact system that is not knowledge-based, but rather learns notes by observation.


Neurocomputing | 1998

Blind separation of convolved mixtures in the frequency domain

Paris Smaragdis

Abstract In this paper we employ information theoretic algorithms, previously used for separating instantaneous mixtures of sources, for separating convolved mixing in the frequency domain. It is observed that convolved mixing in the time domain corresponds to instantaneous mixing in the frequency domain. Such mixing can be inverted using simpler and more robust algorithms than the ones recently developed. Advantages of this approach are improved efficiency and better convergence features.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Convolutive Speech Bases and Their Application to Supervised Speech Separation

Paris Smaragdis

In this paper, we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the nonnegative matrix factorization algorithm. Due to the nonnegativity constraint this type of coding is very well suited for intuitively and efficiently representing magnitude spectra. We present results that reveal the nature of these basis functions and we introduce their utility in separating monophonic mixtures of known speakers


international conference on independent component analysis and signal separation | 2007

Supervised and semi-supervised separation of sounds from single-channel mixtures

Paris Smaragdis; Bhiksha Raj; Madhusudana Shashanka

In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/ frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture are known, and the other being being the case where only the target or the interference models are known. The model we propose has close ties to non-negative decompositions and latent variable models commonly used for semantic analysis.


international conference on acoustics, speech, and signal processing | 2012

Singing-voice separation from monaural recordings using robust principal component analysis

Po-Sen Huang; Scott Chen; Paris Smaragdis; Mark Hasegawa-Johnson

Separating singing voices from music accompaniment is an important task in many applications, such as music information retrieval, lyric recognition and alignment. Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure; on the other hand, singing voices can be regarded as relatively sparse within songs. In this paper, based on this assumption, we propose using robust principal component analysis for singing-voice separation from music accompaniment. Moreover, we examine the separation result by using a binary time-frequency masking method. Evaluations on the MIR-1K dataset show that this method can achieve around 1~1.4 dB higher GNSDR compared with two state-of-the-art approaches without using prior training or requiring particular features.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization

Nasser Mohammadiha; Paris Smaragdis; Arne Leijon

Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.


international conference on acoustics, speech, and signal processing | 2008

Speech denoising using nonnegative matrix factorization with priors

Kevin W. Wilson; Bhiksha Raj; Paris Smaragdis; Ajay Divakaran

We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.


international conference on acoustics, speech, and signal processing | 2014

Deep learning for monaural speech separation

Po-Sen Huang; Minje Kim; Mark Hasegawa-Johnson; Paris Smaragdis

Monaural source separation is useful for many real-world applications though it is a challenging problem. In this paper, we study deep learning for monaural speech separation. We propose the joint optimization of the deep learning models (deep neural networks and recurrent neural networks) with an extra masking layer, which enforces a reconstruction constraint. Moreover, we explore a discriminative training criterion for the neural networks to further enhance the separation performance. We evaluate our approaches using the TIMIT speech corpus for a monaural speech separation task. Our proposed models achieve about 3.8~4.9 dB SIR gain compared to NMF models, while maintaining better SDRs and SARs.


IEEE Transactions on Audio, Speech, and Language Processing | 2015

Joint optimization of masks and deep recurrent neural networks for monaural source separation

Po-Sen Huang; Minje Kim; Mark Hasegawa-Johnson; Paris Smaragdis

Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including speech separation, singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30-4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30-2.48 dB GNSDR gain and 4.32-5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.


Computational Intelligence and Neuroscience | 2008

Probabilistic latent variable models as nonnegative factorizations

Madhusudana V. S. Shashanka; Bhiksha Raj; Paris Smaragdis

This paper presents a family of probabilistic latent variable models that can be used for analysis of nonnegative data. We show that there are strong ties between nonnegative matrix factorization and this family, and provide some straightforward extensions which can help in dealing with shift invariances, higher-order decompositions and sparsity constraints. We argue through these extensions that the use of this approach allows for rapid development of complex statistical models for analyzing nonnegative data.

Collaboration


Dive into the Paris Smaragdis's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bhiksha Raj

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nasser Mohammadiha

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Bhiksha Ramakrishnan

Mitsubishi Electric Research Laboratories

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Judith C. Brown

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Petros T. Boufounos

Mitsubishi Electric Research Laboratories

View shared research outputs
Top Co-Authors

Avatar

Ajay Divakaran

Mitsubishi Electric Research Laboratories

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge