Israel Cohen
Technion – Israel Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Israel Cohen.
IEEE Transactions on Speech and Audio Processing | 2003
Israel Cohen
Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. We present an improved minima controlled recursive averaging (IMCRA) approach, for noise estimation in adverse environments involving nonstationary noise, weak speech components, and low input signal-to-noise ratio (SNR). The noise estimate is obtained by averaging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability. The speech presence probability is controlled by the minima values of a smoothed periodogram. The proposed procedure comprises two iterations of smoothing and minimum tracking. The first iteration provides a rough voice activity detection in each frequency band. Then, smoothing in the second iteration excludes relatively strong speech components, which makes the minimum tracking during speech activity robust. We show that in nonstationary noise environments and under low SNR conditions, the IMCRA approach is very effective. In particular, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.
IEEE Signal Processing Letters | 2002
Israel Cohen; Baruch Berdugo
In this letter, we introduce a minima controlled recursive averaging (MCRA) approach for noise estimation. The noise estimate is given by averaging past spectral power values and using a smoothing parameter that is adjusted by the signal presence probability in subbands. The presence of speech in subbands is determined by the ratio between the local energy of the noisy speech and its minimum within a specified time window. The noise estimate is computationally efficient, robust with respect to the input signal-to-noise ratio (SNR) and type of underlying additive noise, and characterized by the ability to quickly follow abrupt changes in the noise spectrum.
Signal Processing | 2001
Israel Cohen; Baruch Berdugo
In this paper, we present an optimally-modi#ed log-spectral amplitude (OM-LSA) speech estimator and a minima controlled recursive averaging (MCRA) noise estimation approach for robust speech enhancement. The spectral gain function, which minimizes the mean-square error of the log-spectra, is obtained as a weighted geometric mean of the hypothetical gains associated with the speech presence uncertainty. The noise estimate is given by averaging past spectral power values, using a smoothing parameter that is adjusted bythe speech presence probabilityin subbands. We introduce two distinct speech presence probabilityfunctions, one for estimating the speech and one for controlling the adaptation of the noise spectrum. The former is based on the time–frequencydistribution of the a priori signal-to-noise ratio. The latter is determined bythe ratio between the local energyof the noisysignal and its minimum within a speci6ed time window. Objective and subjective evaluation under various environmental conditions con6rm the superiorityof the OM-LSA and MCRA estimators. Excellent noise suppression is achieved, while retaining weak speech components and avoiding the musical residual noise phenomena. ? 2001 Elsevier Science B.V. All rights reserved.
IEEE Signal Processing Letters | 2002
Israel Cohen
We present an optimally modified log-spectral amplitude estimator, which minimizes the mean-square error of the log-spectra for speech signals under signal presence uncertainty. We propose an estimator for the a priori signal-to-noise ratio (SNR), and introduce an efficient estimator for the a priori speech absence probability. Speech presence probability is estimated for each frequency bin and each frame by a soft-decision approach, which exploits the strong correlation of speech presence in neighboring frequency bins of consecutive frames. Objective and subjective evaluation confirm superiority in noise suppression and quality of the enhanced speech.
IEEE Transactions on Audio, Speech, and Language Processing | 2009
Shmulik Markovich; Sharon Gannot; Israel Cohen
In many practical environments we wish to extract several desired speech signals, which are contaminated by nonstationary and stationary interfering signals. The desired signals may also be subject to distortion imposed by the acoustic room impulse responses (RIRs). In this paper, a linearly constrained minimum variance (LCMV) beamformer is designed for extracting the desired signals from multimicrophone measurements. The beamformer satisfies two sets of linear constraints. One set is dedicated to maintaining the desired signals, while the other set is chosen to mitigate both the stationary and nonstationary interferences. Unlike classical beamformers, which approximate the RIRs as delay-only filters, we take into account the entire RIR [or its respective acoustic transfer function (ATF)]. The LCMV beamformer is then reformulated in a generalized sidelobe canceler (GSC) structure, consisting of a fixed beamformer (FBF), blocking matrix (BM), and adaptive noise canceler (ANC). It is shown that for spatially white noise field, the beamformer reduces to a FBF, satisfying the constraint sets, without power minimization. It is shown that the application of the adaptive ANC contributes to interference reduction, but only when the constraint sets are not completely satisfied. We show that relative transfer functions (RTFs), which relate the desired speech sources and the microphones, and a basis for the interference subspace suffice for constructing the beamformer. The RTFs are estimated by applying the generalized eigenvalue decomposition (GEVD) procedure to the power spectral density (PSD) matrices of the received signals and the stationary noise. A basis for the interference subspace is estimated by collecting eigenvectors, calculated in segments where nonstationary interfering sources are active and the desired sources are inactive. The rank of the basis is then reduced by the application of the orthogonal triangular decomposition (QRD). This procedure relaxes the common requirement for nonoverlapping activity periods of the interference sources. A comprehensive experimental study in both simulated and real environments demonstrates the performance of the proposed beamformer.
IEEE Signal Processing Letters | 2004
Israel Cohen
We propose a noncausal estimator for the a priori signal-to-noise ratio (SNR), and a corresponding noncausal speech enhancement algorithm. In contrast to the decision-directed estimator of Ephraim and Malah (1984), the noncausal estimator is capable of discriminating between speech onsets and noise irregularities. Onsets of speech are better preserved, while a further reduction of musical noise is achieved. Experimental results show that the noncausal estimator yields a higher improvement in the segmental SNR, lower log-spectral distortion, and better Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862).
IEEE Transactions on Speech and Audio Processing | 2004
Sharon Gannot; Israel Cohen
In speech enhancement applications microphone array postfiltering allows additional reduction of noise components at a beamformer output. Among microphone array structures the recently proposed general transfer function generalized sidelobe canceller (TF-GSC) has shown impressive noise reduction abilities in a directional noise field, while still maintaining low speech distortion. However, in a diffused noise field less significant noise reduction is obtainable. The performance is even further degraded when the noise signal is nonstationary. In this contribution we propose three postfiltering methods for improving the performance of microphone arrays. Two of which are based on single-channel speech enhancers and making use of recently proposed algorithms concatenated to the beamformer output. The third is a multichannel speech enhancer which exploits noise-only components constructed within the TF-GSC structure. This work concentrates on the assessment of the proposed postfiltering structures. An extensive experimental study, which consists of both objective and subjective evaluation in various noise fields, demonstrates the advantage of the multichannel postfiltering compared to the single-channel techniques.
IEEE Signal Processing Letters | 2009
Emanuël A. P. Habets; Sharon Gannot; Israel Cohen
In speech communication systems the received microphone signals are degraded by room reverberation and ambient noise that decrease the fidelity and intelligibility of the desired speaker. Reverberant speech can be separated into two components, viz. early speech and late reverberant speech. Recently, various algorithms have been developed to suppress late reverberant speech. One of the main challenges is to develop an estimator for the so-called late reverberant spectral variance (LRSV) which is required by most of these algorithms. In this letter a statistical reverberation model is proposed that takes the energy contribution of the direct-path into account. This model is then used to derive a more general LRSV estimator, which in a particular case reduces to an existing LRSV estimator. Experimental results show that the developed estimator is advantageous in case the source-microphone distance is smaller than the critical distance.
IEEE Transactions on Image Processing | 2013
Idan Ram; Michael Elad; Israel Cohen
We propose an image processing scheme based on reordering of its patches. For a given corrupted image, we extract all patches with overlaps, refer to these as coordinates in high-dimensional space, and order them such that they are chained in the “shortest possible path,” essentially solving the traveling salesman problem. The obtained ordering applied to the corrupted image implies a permutation of the image pixels to what should be a regular signal. This enables us to obtain good recovery of the clean image by applying relatively simple one-dimensional smoothing operations (such as filtering or interpolation) to the reordered set of pixels. We explore the use of the proposed approach to image denoising and inpainting, and show promising results in both cases.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Yekutiel Avargel; Israel Cohen
In this paper, we investigate the influence of crossband filters on a system identifier implemented in the short-time Fourier transform (STFT) domain. We derive analytical relations between the number of crossband filters, which are useful for system identification in the STFT domain, and the power and length of the input signal. We show that increasing the number of crossband filters not necessarily implies a lower steady-state mean-square error (mse) in subbands. The number of useful crossband filters depends on the power ratio between the input signal and the additive noise signal. Furthermore, it depends on the effective length of input signal employed for system identification, which is restricted to enable tracking capability of the algorithm during time variations in the system. As the power of input signal increases or as the time variations in the system become slower, a larger number of crossband filters may be utilized. The proposed subband approach is compared to the conventional fullband approach and to the commonly used subband approach that relies on multiplicative transfer function (MTF) approximation. The comparison is carried out in terms of mse performance and computational complexity. Experimental results verify the theoretical derivations and demonstrate the relations between the number of useful crossband filters and the power and length of the input signal