Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nicoleta Roman is active.

Publication


Featured researches published by Nicoleta Roman.


Speech Communication | 2006

Binary and Ratio Time-frequency Masks for Robust Speech Recognition

Soundararajan Srinivasan; Nicoleta Roman; DeLiang Wang

A time-varying Wiener filter specifies the ratio of a target signal and a noisy mixture in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech signal, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this system with a missing-data recognizer that operates in the spectral domain using the time-frequency units that are dominated by speech. To apply the missing-data recognizer, the same binaural processor is used to estimate an ideal binary time-frequency mask, which selects a local time-frequency unit if the speech signal within the unit is stronger than the interference. We find that the performance of the missing data recognizer is better on a small vocabulary recognition task but the performance of the conventional recognizer is substantially better when the vocabulary size is increased.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Binaural Tracking of Multiple Moving Sources

Nicoleta Roman; DeLiang Wang

This paper addresses the problem of tracking multiple moving sources using binaural input. We observe that binaural cues are strongly correlated with source locations in time-frequency regions dominated by only one source. Based on this observation, we propose a novel tracking algorithm that integrates probabilities across reliable frequency channels in order to produce a likelihood function in the target space, which describes the azimuths of all active sources at a particular time frame. Finally, a hidden Markov model (HMM) is employed to form continuous tracks and automatically detect the number of active sources across time. Results are presented for up to three moving talkers in anechoic conditions. A comparison shows that our HMM model outperforms a Kalman filter-based approach in tracking active sources across time. Our study represents a first step in addressing auditory scene analysis with moving sound sources.


Journal of the Acoustical Society of America | 2006

Binaural segregation in multisource reverberant environments

Nicoleta Roman; Soundararajan Srinivasan; DeLiang Wang

In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a background. Accordingly, the goal is to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. In this paper, a binaural segregation system that extracts the reverberant target signal from multisource reverberant mixtures by utilizing only the location information of target source is proposed. The proposed system combines target cancellation through adaptive filtering and a binary decision rule to estimate the ideal T-F binary mask. The main observation in this work is that the target attenuation in a T-F unit resulting from adaptive filtering is correlated with the relative strength of target to mixture. A comprehensive evaluation shows that the proposed system results in large SNR gains. In addition, comparisons using SNR as well as automatic speech recognition measures show that this system outperforms standard two-microphone beamforming approaches and a recent binaural processor.


Journal of the Acoustical Society of America | 2006

Pitch-based monaural segregation of reverberant speech

Nicoleta Roman; DeLiang Wang

In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.


Journal of the Acoustical Society of America | 2011

Intelligibility of reverberant noisy speech with ideal binary masking

Nicoleta Roman; John Woodruff

For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners.


international conference on acoustics, speech, and signal processing | 2004

Binaural sound segregation for multisource reverberant environments

Nicoleta Roman; DeLiang Wang

We present a novel method for binaural sound segregation from acoustic mixtures contaminated by both multiple interference and reverberation. We employ the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. As opposed to classical adaptive filtering, which focuses on the suppression of noise, our model employs an adaptive filter that performs target cancellation. T-F units dominated by a target are largely suppressed at the output of the cancellation unit when compared to units dominated by noise. Consequently, the actual input-to-output attenuation level in each T-F unit is used to estimate an ideal binary mask. A systematic evaluation in terms of automatic speech recognition performance shows that the resulting system produces masks close to ideal binary ones.


international conference on acoustics, speech, and signal processing | 2003

Binaural tracking of multiple moving sources

Nicoleta Roman; DeLiang Wang

This paper presents a novel method for tracking the azimuth locations of multiple active sources based on binaural processing. Binaural cues are strongly correlated with source locations for spectral regions dominated by only one source. Therefore, this approach integrates reliable information across different frequency channels to produce a likelihood function in the target space. Finally, a hidden Markov model (HMM) is employed for forming continuous tracks and detecting the number of active sources across time. Experimental results are presented for simulated multi-source scenarios.


Journal of the Acoustical Society of America | 2013

Speech intelligibility in reverberation with ideal binary masking: effects of early reflections and signal-to-noise ratio threshold.

Nicoleta Roman; John Woodruff

Ideal binary masking is a signal processing technique that separates a desired signal from a mixture by retaining only the time-frequency units where the signal-to-noise ratio (SNR) exceeds a predetermined threshold. In reverberant conditions there are multiple possible definitions of the ideal binary mask in that one may choose to treat the target early reflections as either desired signal or noise. The ideal binary mask may therefore be parameterized by the reflection boundary, a predetermined division point between early and late reflections. Another important parameter is the local SNR threshold used in labeling the time-frequency units as either target or background. Two experiments were designed to assess the impact of these two parameters on speech intelligibility with ideal binary masking for normal-hearing listeners in reverberant conditions. Experiment 1 shows that in order to achieve intelligibility improvements only the early reflections should be preserved by the binary mask. Moreover, it shows that the effective SNR should be accounted for when deciding the local threshold optimal range. Experiment 2 shows that with long reverberation times, intelligibility improvements are only obtained when the reflection boundary is 100 ms or less. Also, the experiment suggests that binary masking can be used for dereverberation.


conference on privacy, security and trust | 2006

Intelligent virus detection on mobile devices

Deepak Venugopal; Guoning Hu; Nicoleta Roman

In this paper, we describe a new solution for detecting mobile phone viruses. The solution is based on Bayesian decision theory using heuristic rules derived from common functionalities among different virus samples. Specifically, we detect viruses according to the DLL usage of a program, which is directly linked to the functionality of this program. Our solution is able to detect unknown viruses, especially the variants of existing ones. We evaluate our solution on the Symbian platform, where most viruses are present in the wild. We constructed a virus detector based on DLL functions from a small set of virus samples. It detects 95% of mobile viruses and yields no false alarm.


international conference on acoustics, speech, and signal processing | 2002

Location-based sound segregation

Nicoleta Roman; DeLiang Wang; Guy J. Brown

At a cocktail party, we can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel location-based approach for speech segregation. The auditory masking effect motivates the notion of an “ideal” time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency region. We observe that within a narrow frequency band modifications to the relative energy of the target source with respect to the interfering energy trigger systematic deviations for binaural cues. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, we perform pattern classification in order to estimate ideal binary masks. A systematic evaluation shows that the resulting system produces masks very close to ideal binary ones, and large improvement over previous models.

Collaboration


Dive into the Nicoleta Roman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Guy J. Brown

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

C. Mihai

Ohio State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge