Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Philipos C. Loizou is active.

Publication


Featured researches published by Philipos C. Loizou.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Evaluation of Objective Quality Measures for Speech Enhancement

Yi Hu; Philipos C. Loizou

In this paper, we evaluate the performance of several objective measures in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. The objective measures considered a wide range of distortions introduced by four types of real-world noise at two signal-to-noise ratio levels by four classes of speech enhancement algorithms: spectral subtractive, subspace, statistical-model based, and Wiener algorithms. The subjective quality ratings were obtained using the ITU-T P.835 methodology designed to evaluate the quality of enhanced speech along three dimensions: signal distortion, noise distortion, and overall quality. This paper reports on the evaluation of correlations of several objective measures with these three subjective rating scales. Several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.


Speech Communication | 2007

Subjective comparison and evaluation of speech enhancement algorithms

Yi Hu; Philipos C. Loizou

Making meaningful comparisons between the performance of the various speech enhancement algorithms proposed over the years, has been elusive due to lack of a common speech database, differences in the types of noise used and differences in the testing methodology. To facilitate such comparisons, we report on the development of a noisy speech corpus suitable for evaluation of speech enhancement algorithms. This corpus is subsequently used for the subjective evaluation of 13 speech enhancement methods encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms. The subjective evaluation was performed by Dynastat, Inc. using the ITU-T P.835 methodology designed to evaluate the speech quality along three dimensions: signal distortion, noise distortion and overall quality. This paper reports the results of the subjective tests.


international conference on acoustics, speech, and signal processing | 2002

A multi-band spectral subtraction method for enhancing speech corrupted by colored noise

Sunil Kamath; Philipos C. Loizou

The spectral subtraction method is a well-known noise reduction technique. Most implementations and variations of the basic technique advocate subtraction of the noise spectrum estimate over the entire speech spectrum. However, real world noise is mostly colored and does not affect the speech signal uniformly over the entire spectrum. In this paper, we propose a multi-band spectral subtraction approach which takes into account the fact that colored noise affects the speech spectrum differently at various frequencies. This method outperforms the standard power spectral subtraction method resulting in superior speech quality and largely reduced musical noise.


Journal of the Acoustical Society of America | 1997

Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs

Michael F. Dorman; Philipos C. Loizou; Dawne Rainey

Vowels, consonants, and sentences were processed through software emulations of cochlear-implant signal processors with 2-9 output channels. The signals were then presented, as either the sum of sine waves at the center of the channels or as the sum of noise bands the width of the channels, to normal-hearing listeners for identification. The results indicate, as previous investigations have suggested, that high levels of speech understanding can be obtained using signal processors with a small number of channels. The number of channels needed for high levels of performance varied with the nature of the test material. For the most difficult material--vowels produced by men, women, and girls--no statistically significant differences in performance were observed when the number of channels was increased beyond 8. For the least difficult material--sentences--no statistically significant differences in performance were observed when the number of channels was increased beyond 5. The nature of the output signal, noise bands or sine waves, made only a small difference in performance. The mechanism mediating the high levels of speech recognition achieved with only few channels of stimulation may be the same one that mediates the recognition of signals produced by speakers with a high fundamental frequency, i.e., the levels of adjacent channels are used to determine the frequency of the input signal. The results of an experiment in which frequency information was altered but temporal information was not altered indicates that vowel recognition is based on information in the frequency domain even when the number of channels of stimulation is small.


IEEE Signal Processing Magazine | 1998

Mimicking the human ear

Philipos C. Loizou

A prosthetic device, called a cochlear implant, can be implanted in the inner ear and can restore partial hearing to profoundly deaf people. Some individuals with implants can now communicate without lip-reading or signing, and some can communicate over the telephone. The success of cochlear implants can be attributed to the combined efforts of scientists from various disciplines including bioengineering, physiology, otolaryngology, speech science, and signal processing. Each of these disciplines contributed to various aspects of the design of cochlear prostheses. Signal processing, in particular, played an important role in the development of different techniques for deriving electrical stimuli from the speech signal. Designers of cochlear prosthesis were faced with the challenge of developing signal-processing techniques that would mimic the function of a normal cochlea. The purpose of this article is to present an overview of various signal-processing techniques that have been used for cochlear prosthesis over 25 years. The signal-processing strategies described are only a subset of the many that have been developed for cochlear prosthesis.


IEEE Transactions on Speech and Audio Processing | 2003

A generalized subspace approach for enhancing speech corrupted by colored noise

Yi Hu; Philipos C. Loizou

A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. A nonunitary transform, based on the simultaneous diagonalization of the clean speech and noise covariance matrices, is used to project the noisy signal onto a signal-plus-noise subspace and a noise subspace. The clean signal is estimated by nulling the signal components in the noise subspace and retaining the components in the signal subspace. The applied transform has built-in prewhitening and can therefore be used in general for colored noise. The proposed approach is shown to be a generalization of the approach proposed by Y. Ephraim and H.L. Van Trees (see ibid., vol.3, p.251-66, 1995) for white noise. Two estimators are derived based on the nonunitary transform, one based on time-domain constraints and one based on spectral domain constraints. Objective and subjective measures demonstrate improvements over other subspace-based methods when tested with TIMIT sentences corrupted with speech-shaped noise and multi-talker babble.


Speech Communication | 2006

A noise-estimation algorithm for highly non-stationary environments

Sundarrajan Rangachari; Philipos C. Loizou

A noise-estimation algorithm is proposed for highly non-stationary noise environments. The noise estimate is updated by averaging the noisy speech power spectrum using time and frequency dependent smoothing factors, which are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. The local minimum estimation algorithm adapts very quickly to highly non-stationary noise environments. This was confirmed with formal listening tests which indicated that the proposed noise-estimation algorithm when integrated in speech enhancement was preferred over other noise-estimation algorithms.


Journal of the Acoustical Society of America | 2009

Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions

Jianfen Ma; Yi Hu; Philipos C. Loizou

The articulation index (AI), speech-transmission index (STI), and coherence-based intelligibility metrics have been evaluated primarily in steady-state noisy conditions and have not been tested extensively in fluctuating noise conditions. The aim of the present work is to evaluate the performance of new speech-based STI measures, modified coherence-based measures, and AI-based measures operating on short-term (30 ms) intervals in realistic noisy conditions. Much emphasis is placed on the design of new band-importance weighting functions which can be used in situations wherein speech is corrupted by fluctuating maskers. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech (consonants and sentences) corrupted by four different maskers (car, babble, train, and street interferences). Of all the measures considered, the modified coherence-based measures and speech-based STI measures incorporating signal-specific band-importance functions yielded the highest correlations (r=0.89-0.94). The modified coherence measure, in particular, that only included vowel/consonant transitions and weak consonant information yielded the highest correlation (r=0.94) with sentence recognition scores. The results from this study clearly suggest that the traditional AI and STI indices could benefit from the use of the proposed signal- and segment-dependent band-importance functions.


Journal of the Acoustical Society of America | 2009

An algorithm that improves speech intelligibility in noise for normal-hearing listeners

Gibak Kim; Yang Lu; Yi Hu; Philipos C. Loizou

Traditional noise-suppression algorithms have been shown to improve speech quality, but not speech intelligibility. Motivated by prior intelligibility studies of speech synthesized using the ideal binary mask, an algorithm is proposed that decomposes the input signal into time-frequency (T-F) units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target or the masker. Speech corrupted at low signal-to-noise ratio (SNR) levels (-5 and 0 dB) using different types of maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility (over 60% points in -5 dB babble) over that attained by human listeners with unprocessed stimuli. The findings from this study suggest that algorithms that can estimate reliably the SNR in each T-F unit can improve speech intelligibility.


Journal of the Acoustical Society of America | 2008

Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction

Ning Li; Philipos C. Loizou

The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type, and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units. Performance plateaued near 100% correct for SNR thresholds ranging from -20 to 5 dB. The existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listeners attention to where the target is and enables them to segregate speech effectively in multitalker environments.

Collaboration


Dive into the Philipos C. Loizou's collaboration.

Researchain Logo
Decentralizing Knowledge