Speech Enhancement in Adverse Environments Based on Non-stationary Noise-driven Spectral Subtraction and SNR-dependent Phase Compensation
Md Tauhidul Islam, Asaduzzaman, Celia Shahnaz, Wei-Ping Zhu, M. Omair Ahmad
SSpeech Enhancement in Adverse Environments Based on Non-stationaryNoise-driven Spectral Subtraction and SNR-dependent Phase Compensation
Md Tauhidul Islam a , Asaduzzaman b , Celia Shahnaz b, , Wei-Ping Zhu c , M. Omair Ahmad c a Department of Electrical and Computer Engineering, Texas A M University, College Station, Texas, USA-77840 b Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka-1000, Bangladesh c Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec H3G 1M8, Canada
Abstract
A two-step enhancement method based on spectral subtraction and phase spectrum compensation is presented inthis paper for noisy speeches in adverse environments involving non-stationary noise and medium to low levels ofSNR. The magnitude of the noisy speech spectrum is modified in the first step of the proposed method by a spectralsubtraction approach, where a new noise estimation method based on the low frequency information of the noisyspeech is introduced. We argue that this method of noise estimation is capable of estimating the non-stationary noiseaccurately. The phase spectrum of the noisy speech is modified in the second step consisting of phase spectrumcompensation, where an SNR-dependent approach is incorporated to determine the amount of compensation to beimposed on the phase spectrum. A modified complex spectrum is obtained by aggregating the magnitude from thespectral subtraction step and modified phase spectrum from the phase compensation step, which is found to be abetter representation of enhanced speech spectrum. Speech files available in the NOIZEUS database are used to carryextensive simulations for evaluation of the proposed method.
Keywords:
Speech enhancement, spectral subtraction, magnitude compensation, phase compensation, noiseestimation
1. Introduction
Corruption of speech signals by the additive or multiplicative noise deteriorates the performance of speech pro-cessing applications such as speech communication, speech recognition etc. Removing the disturbing noise whilepreserving the speech is desirable for proper operation of these systems. Many speech enhancement methods havebeen proposed to achieve this goal. Spectral subtraction [1, 2, 3, 4, 5], Wiener filter [6, 7], minimum mean squareerror (MMSE) estimator [8, 9], subspace based methods [6, 7], thresholding methods based on wavelet transform[10, 11, 12, 13] and Kalman filtering [14] are the prominent ones. Subspace and wavelet based approaches are com-putationally slow. Frequency-domain methods are computationally fast, but most of them need an estimation of noiseto perform speech enhancement.Low computation with good performance in stationary noise makes spectral subtraction a very attractive andwidely used method. In spectral subtraction, the noise spectrum is estimated and subtracted from the noisy speechspectrum. If there is no variation of the noise with time which means that the noise is stationary, the method workswell. However, in presence of non-stationary noise, the performance of this method degrades because of its inabilityto estimate the noise properly. Another problem of this method is the presence of musical noise, a noise of increasingvariance. The first problem is the main concern of [15], where noise is estimated based on the high-order Yule-Walkerequations without finding the non-speech frames. This method can track the non-stationary noise but computationallyexhaustive. Another method named minimum statistics based spectral subtraction [16], can estimate the non-stationarynoise with less computation but this method depends on the noise estimation in the past frames which sometimes
Corresponding author
Email address: (Celia Shahnaz)
Preprint submitted to Elsevier February 18, 2018 nvokes wrong estimates or leads to speech distortion. In [17], noise spectrum is estimated based on informationof the high frequency spectrum of the current frame. This method requires very high sampling rate which createssignificant problems in the context of speech processing application.In the above mentioned methods, although the spectrum of the noisy speech is a complex number, only the magni-tude is modified based on the estimate of the noise spectrum and phase remains unchanged. This was being done for along time based on an assumption that human auditory system is phase-deaf, i.e., cannot di erentiate change of phase,until the authors in [18] showed that the phase spectrum can also be very useful in speech enhancement. The authorsused the phase spectrum in a spectral subtraction based approach to obtain an enhanced speech. Later, the authorsin [19, 20] also used this idea for speech enhancement. But these methods did not consider the magnitude spectrumat all. In this paper, we consider both magnitude and phase spectra and compensate both of them based on the noisecharacteristics. We develop a noise estimation approach that can track the time variation of non-stationary noise formagnitude spectrum compensation. The phase spectrum is compensated in an SNR-dependent phase compensationstep. We aggregate the modified magnitude and phase from these two steps and we find this modified complex spec-trum e ective in producing enhanced speech of improved quality with minimal speech distortion as compared to someof the state-of-the-art speech enhancement methods.The paper is organized as follows. Section 2 presents the proposed method. Section 3 describes results. Conclud-ing remarks are presented in section 4.
2. Problem Formulation and Proposed Method
In any analysis, modification and synthesis (AMS) framework, at first, noisy speech frames are transformed bya transformation method. Then modifications are carried out in the transformed domain and finally, the inversetransform of the transformation method followed by the overlap-add method is performed to reconstruct the enhancedspeech. The proposed method is based on the AMS framework, where speech is analyzed, modified and synthesizedframe wise.In the presence of additive noise d[n], a clean speech signal x[n] gets contaminated and produces noisy speechy[n]. The noisy speech can be segmented into overlapping frames by using a sliding window. th windowed noisyspeech frame can be expressed in the time domain as y [ n ] x [ n ] d [ n ] 1 T (1)where T is the total number of speech frames. If Y [ k ], X [ k ] and D [ k ] are the short-time Fourier transform (STFT)representations of y [ n ], x [ n ] and d [ n ], respectively, we can write Y [ k ] X [ k ] D [ k ] (2)where k N N is the total number of samples in a frame. The N -point STFT, Y [ k ] of y [ n ] can becomputed as Y [ k ] N n y [ n ] e j nkN (3)The Fourier transform of the noisy speech frame, Y [ k ] is modified in the proposed method to obtain an estimate ofthe clean speech spectrum X [ k ].An overview of the proposed speech enhancement method is shown by a block diagram in Fig. 1. It is seen fromFig. 1 that Fourier transform is first applied to each input speech frame. The magnitude of the Fourier spectrumis modified in a spectral subtraction method based on non-stationary noise estimation, which we call step-1. Themodified magnitude from step-1 is then combined with unchanged phase to obtain the modified complex spectrum.Using inverse fast Fourier transform (IFFT) and overlap and add, an intermediate speech signal is obtained. Thespectrum of the intermediate speech is sent to step-2, which consists of phase spectrum compensation (PSC) [18].PSC modifies the phase spectrum based on the SNR of the intermediate speech. Using the modified phase spectrumwith the modified magnitude spectrum from the first step, we obtain an enhanced complex spectrum. Finally, usingIFFT and overlap and add, an enhanced speech is constructed. The full AMS process is done for both steps to get fullflexibilities of using di erent window sizes and parameters.2 igure 1: Block diagram of the proposed method. In this section, the magnitude spectrum of the noisy speech is modified based on the estimation of non-stationarynoise. Unlike conventional methods, the estimate of noise spectrum is updated in every silence period and the lowfrequency region of magnitude spectrum is taken into consideration in order to compensate for the noise estimationerrors that may be induced when the additive noise is non-stationary, i.e., changes its amplitude drastically with time.We proposed to obtain an estimate of X [ k ] , Z [ k ] as Z [ k ] H [ k ] if H [ k ] 0 s Y [ k ] otherwise (4)where H [ k ] Y [ k ] D S [ k ] (5)In (4), s refers to the spectral flow parameter introduced to prevent any negative value in Z [ k ] . In (5), symbolizesthe tracking factor which tracks the change of amplitude of the non-stationary noise spectrum with time and D S [ k ]denotes the estimated noise spectrum in previous silence frame. In the proposed spectral subtraction based noisereduction scheme, the noise spectrum estimated from the beginning silence frames is updated during each silenceperiod as follows D S [ k ] Y [ k ] Y Ns [ k ] N s for S v n D P [ k ] (1 v n ) Y S [ k ] otherwise (6)3here N s is the number of initial silence frame, P refers to the index of the previous silence frame with respect to S and v n is the forgetting factor. Considering that this estimate of the noise power spectrum is updated only during asilence period while it may change drastically with time, it is insu cient to use a constant value of the tracking factorto compensate for the errors induced in the noise spectrum to be subtracted from the noisy speech spectrum at eachframe. In order to track the time variation of the noise, should be adjusted at each frame after a silence period.According to the spectral characteristics of human speech, the low frequency band typically from 0 to 50 Hz containsno speech information. Thus, for noisy speech, the low frequency band, say [0 50] Hz contains only noise. Inview of this fact, in order to change the value of for th frame, we propose to use the ratio of Y [ k ] and D S [ k ] inlow frequency band delta as Y [ k ] D S [ k ] (7)where [0 50] Hz, is a constant determined empirically. In the low frequency band of the th frame, thevariation of the noisy speech spectrum is equivalent to the noise spectrum of that frame. Thus, use of definedin (7) clearly serves as a relative weighing factor with respect to the estimated noise spectrum D S [ k ] , leading to areasonable tracking for the time variation of the noise if non-stationary. Please note that a voice activity detector isused in the proposed scheme from [1] for detecting the speech and silence frames.Aggregating the modified magnitude spectrum with the unchanged phase of noisy speech, we obtain a modifiedcomplex spectrum as Z [ k ] Z [ k ] e Y [ k ] (8)After using IFFT on Z [ k ] and overlap and add of real part of the resulting signal, we obtain time-domain intermediatespeech z [ n ]. If we apply STFT on z [ n ], we obtain Z t [ k ], where t is the frame number for step-2. In step-2, the modifiedcomplex spectrum Z t [ k ] is modified in such a way that the low energy component cancel out more than the highenergy components. The modified complex spectrum thus obtained is a better representation of X t [ k ]. X t [ k ] Z t [ k ] e j ( Z t [ k ] t [ k ]) (9) z t [ n ], t th frame of the intermediate speech, is a real valued signal and therefore, its FFT is conjugate symmetric,i.e., Z t [ k ] Z t [ N t k ] (10)where N t is the number of samples in a frame in step-2. The conjugate can be obtained as a result of applyingFFT on z t [ n ]. The conjugate arise naturally from the symmetry of the magnitude spectrum and anti-symmetry ofthe phase spectrum. During IFFT operation as needed for synthesis of enhanced speech, the conjugates are summedtogether to produce larger real valued signal. If the conjugates are modified, the degree to which they sum togethercan be influenced and this can be contributed constructively or destructively to the reconstruction of the enhancedtime domain speech. We propose the degree of phase spectrum compensation to be dependent on the SNR estimateof the current frame thus facilitating the handling of time and frequency varying non-stationary noise conditions. Forthis purpose, we formulate a phase spectrum compensation function as given by t [ k ] [ k ] V t (11)where V t is the root mean square value of Z t , where Z t ( Z t [1] Z t [ N t ]) T [18]. In (11), is a real valued constantand [ k ] presents a weighting function expressed as[ k ] 1 if 0 kN kN
10 otherwise. (12)Here, zero weighting is assigned to the values of k corresponding to the non-conjugate vectors of FFT, such as k k N , if N even. 4 igure 2: as a function of a posterior SNR. Unlike [18], instead of considering as a constant, we propose to determine it as (13)where is defined as Y t [ k ] V t (14)The right hand side of (14) is the a posteriori SNR of the t th intermediate speech frame and the plot of with SNRis shown in Fig. 2. It is seen from Fig. 2 that if the SNR increases, the value of the constant decreases and phasecompensation becomes less so that there is no distortion in the signal. On the contrary, when noise increases to a higherlevel, increases. As a result, the phase compensation on the signal increases and denoising is obtained to a significantextent. Since the estimate of noise magnitude spectrum V t is constant, introduction of the weighting function [ k ]defined by (11) produces an anti-symmetric compensation function t [ k ] that acts as the cause for changing theangular phase relationship in order to achieve noise cancellation during synthesis. Although detail explanation ofthe phase compensation method is given in [18], we revisit the explanation for clarity of our method. Explanationfor two cases of single conjugate pair and their corresponding modifications, i.e., when the estimated speech vectorfrom first step is greater and smaller than the phase compensation function are presented in Fig. 3, where both thetime frequency indexes are omitted for convenience and clarity. We will denote the phase compensation functionas , the two conjugates of Z [ k ] as Z and Z , and of X [ k ] as X and X . For the representation in Fig. 3(a), themagnitude of Z and Z are considered larger than . Column one of Fig. 3(a) shows the conjugate vectors Z and Z as well as their summation vector Z Z , in column two the real part of the Z and Z are shown to be o set byand - , respectively. Altering the angles of the vectors Z and Z while keeping their magnitude unchanged thusproduces vectors X and X , respectively. It is seen from the column three that the vector X X is produced as aresult of adding the modified vectors X and X . Column four demonstrates the real part of the addition vector X X ,while its imaginary part is discarded with a view to avoid getting complex time domain frames after IFFT operation.Comparing column one and four of Fig. 3(a), it is clear that a limited change of original signal occurs if Z and Z are greater than . In Fig. 3(b), similar illustration is shown by considering Z and Z is smaller than and foundthat significant change of the original signal occurs. Since is anti symmetric, the angle of the conjugate pair in eachcase of Fig. 3 are pushed in opposite directions, one towards 0 radian and other towards radian. The Further they5re pushed apart, the more out of phase they become. This justifies that, FFT spectrum of noisy speech with largermagnitude undergoes less attenuation and that with smaller magnitude undergoes more. Figure 3: Phase spectrum compensation (a) when Z (b) when Z . The enhanced speech frame is synthesized by performing the IFFT on the resulting X t [ k ], x t [ n ] Re IFFT X t [ k ] (15)where Re ( ) denotes the real part of the number inside it and x t [ n ] represents the enhanced speech frame. The finalenhanced speech is synthesized by using the standard overlap and add method [21].
3. Results
In this section, a number of simulations is carried out to evaluate the performance of the proposed method.
The above proposed method which we call non-stationary noise-driven spectral subtraction with SNR-dependentphase compensation (NSSP) is implemented in MATLAB R2016b graphical user interface development environment(GUIDE). The MATLAB software with its user manual is attached as supplementary material with the paper. Thissoftware also includes implementation of some recent methods, i.e., multi-band spectral subtraction (MBSS) [22],6 able 1: Constants used in the spectral subtraction step
Constant Value v n Real speech sentences from the NOIZEUS database are employed for the experiments, where the speech data aresampled at 8 kHz. To imitate a noisy environment, noise sequence is added to the clean speech samples at di erentSNR levels ranging from 10 dB to 20 dB. Two di erent types of noises, such as babble and street are adopted fromthe NOIZEUS database.In order to obtain overlapping analysis frames in the spectral subtraction step, Hamming windowing operationis performed, where the size of each of the frame is 96 samples with 50% overlap between successive frames. Inthe phase compensation step, Gri n and Lim’s modified Hanning window is used and the size of each frame is 256samples with 25% overlap. Values of used constants in the first step are given in Table 1.
Standard Objective metrics [39], namely, segmental SNR (SNRSeg) improvement in dB, overall SNR improve-ment in dB, perceptual evaluation of speech quality (PESQ) are used for the evaluation of the proposed NSSP method.The proposed method is subjectively evaluated in terms of the spectrogram representations of the clean speech, noisyspeech and enhanced speech. Formal listening tests are also carried out in order to find the analogy between theobjective metrics and subjective sound quality. The performance of our method is compared with MBSS [22], PSC[24] and SMPO [23] in both objective and subjective senses.
SNRSeg improvement, overall SNR improvement and PESQ scores for speech signals corrupted with street noisefor MBSS, PSC, SMPO and NSSP are shown for a SNR range of 20 dB to 10 dB in Fig. 4, 5 and Table 2.In Fig. 4, we see that the SNRSeg improvement for NSSP is the highest at the lowest SNR of 20 dB. Thenearest SNRSeg improvement is shown by SMPO, which is almost half of the SNRSeg improvement by NSSP. Withincrement of the SNR, SNRSeg improvement for NSSP decreases. But at the highest SNR of 10 dB, NSSP shows anSNRSeg improvement of 4 1 dB, which is much better than MBSS, SMPO and PSC. Another interesting fact is thatthe SNRSeg improvement for NSSP increases monotonically with decrement of SNR. But SNRSeg improvement forother methods increase upto SNR of 5 dB, then starts to decrease. Higher SNRSeg improvement of the proposedNSSP method in all SNRs attests that NSSP can enhance the noisy speech better than other competing methods infavorable as well as adverse environments.In Fig. 5, where we plot the overall SNR improvement for all the methods for SNR range of 20 to 10 dB, we seethat NSSP provides an excellent overall SNR improvement of 14 dB at SNR level of 20 dB. Other methods provideoverall SNR improvement of 11, 9 and 8 2 dB at that SNR level. NSSP continues to provide higher overall SNR7
20 -15 -10 -5 0 5 10SNR (dB)123456 MBSSPSCSMPONSSP
Figure 4: SNRSeg improvement for di erent methods in street noise.Table 2: PESQ for di erent methods
SNR(dB) MBSS PSC SMPO NSSP -20 1.15 1.16 1.35 1.30-15 1.37 1.23 1.47 1.51-10 1.51 1.32 1.65 1.65-5 1.69 1.43 1.77 1.830 2.07 1.69 1.89 1.745 2.38 1.93 2.57 2.4510 2.60 2.14 2.78 2.69 -20 -15 -10 -5 0 5 10SNR (dB)051015 MBSSPSCSMPONSSP
Figure 5: Overall SNR improvement for di erent methods in street noise. improvement upto 0 dB, from where to 10 dB SNR, it provides competitive improvements in comparison to PSC andbetter than SMPO and MBSS. 8ESQ values for di erent methods for all the SNR levels for street noise-corrupted speech are shown in Table 2.For higher SNR as 10 dB, we see that all the methods provide better PESQ. But with the decrement of SNR, PESQvalues for all the cases start to fall. The proposed method provides very competitive PESQ values for all SNR levels incomparison to SMPO but performs better than other two competing methods. As PESQ value indicates the perceptualquality of the enhanced speech, this table proves that the proposed method provides better enhanced speech in streetnoise corrupted speech at high as well as low SNRs than MBSS and PSC.
SNRSeg improvement, overall SNR improvement and PESQ scores for speech signals corrupted with babble noisefor MBSS, PSC, SMPO and NSSP are shown in Fig. 6, 7 and 8.In Fig. 6, the performance of the NSSP is compared with performances of other methods at di erent levels of SNRin terms of SNRSeg improvement. From this figure, we see that the SNRSeg improvement in dB increases as SNRdecreases for NSPP for all SNR levels. This is not true for MBSS, PSC and SMPO. Below SNR level of 10 dB, mostof these three methods start to loose overall SNR improvement. At a low SNR of 20 dB, NSSP yields the highestSNRSeg improvement of more than 6 dB. Such larger values of SNRSeg improvement at a low level of SNR attest thecapability of NSSP in producing enhanced speech with better quality for speech corrupted by babble noise- severely.The overall SNR improvements of MBSS, PSC, SMPO and NSSP are shown in Fig. 7, where it is seen that NSSPprovides an improvement of almost 18 dB at SNR level of 20 dB, which is significantly better than other methods.This trend continues upto 0 dB. After that NSSP provides competitive value in comparison to PSC and SMPO.PESQ values for di erent methods are shown in Fig. 8 for noisy speech in babble noise. We see from this figurethat although NSSP provides competitive PESQ scores in comparison to other methods for SNR levels of 20 to10 dB, it provides higher PESQ scores for all other SNR levels.
Figure 6: SNRSeg improvement for di erent methods in babble noise.
To evaluate the performance of the proposed method and other competing methods subjectively, we use twocommonly used tools. The first one is the plot of the spectrograms of the outputs of all the methods and compare theirperformances in terms of preservation of harmonics and capability to remove noise.The spectrograms of the clean speech, the noisy speech, and the enhanced speech signals obtained by using theproposed method and all other methods are presented in Fig. 9 for street noise corrupted speech at an SNR of 10 dB.It is obvious from the spectrograms that the proposed method preserves the harmonics significantly better than all theother competing methods. The noise is also reduced at every time point for the proposed method which attest ourclaim of better performance in terms of higher SNRSeg improvement, higher overall SNR improvement and higherPESQ values in objective evaluation. Another collection of spectrograms for the proposed method with other methodsfor speech signals corrupted with babble noise is shown in Fig. 10. This figure also attests that our proposed methodhas better performance in terms of harmonics’ preservation and noise removal in presence of street noise.9
20 -15 -10 -5 0 5 10SNR (dB)051015 MBSSPSCSMPONSPP
Figure 7: Overall SNR improvement for di erent methods in babble noise.Figure 8: PESQ for di erent methods in babble noise.
The second tool we used for subjective evaluation of the proposed method and the competing methods is theformal listening tests. We add street and babble noises to all the thirty speech sentences of NOIZEUS database at20 to 10 dB SNR levels and process them with all the competing methods. We allow ten listeners to listen to theseenhanced speeches from these methods and evaluate them subjectively. Following [13] and [25], We use SIG, BAKand OVL scales on a range of 1 to 5. The detail of these scales and procedure of this listening test is discussed in [13].More details on this testing methodology of listening test can be obtained from [26].We show the mean scores of SIG, BAK, and OVRL scales for all the methods for speech signals corrupted with10 dB street noise in Tables 3, 4, and 5 and for speech signals corrupted with 10 dB babble noise in Tables 6, 7,and 8. The higher values for the proposed method in comparison to other methods in these tables clearly attest thatthe proposed method is better than the competing methods in terms of lower signal distortion (higher SIG scores),e cient noise removal (higher BAK scores) and overall sound quality (higher OVL scores) for all SNR levels.10 able 3: Mean scores of SIG scale for di erent methods in presence of street noise at 10 dB
Listener MBSS PSC SMPO NSSP
Table 4: Mean scores of BAK scale for di erent methods in presence of street noise at 10 dB
Listener MBSS PSC SMPO NSSP
Table 5: Mean scores of OVL scale for di erent methods in presence of street noise at 10 dB
Listener MBSS PSC SMPO NSSP able 6: Mean scores of SIG scale for di erent methods in presence of babble noise at 10 dB
Listener MBSS PSC SMPO NSSP
Table 7: Mean scores of BAK scale for di erent methods in presence of babble noise at 10 dB
Listener MBSS PSC SMPO NSSP
Table 8: Mean scores of OVL scale for di erent methods in presence of babble noise at 10 dB
Listener MBSS PSC SMPO NSSP
Time(s) (a)
Time(s) (b)
Time(s) (c)
Time(s) (d)
Time(s) (e)
Time(s) (f)Figure 9: Spectrograms of (a) clean signal (b) noisy signal with 10 dB street noise; spectrograms of enhanced speech from (c) MBSS (d) PSC (e)SMPO (f) NSSP.
4. Conclusions
An improved speech enhancement method based on magnitude and phase compensation is presented in this paperfor enhancement of noisy speech in adverse environment. Spectral subtraction is used in the first step for magnitudecompensation depending on a new non-stationary noise estimation. In the second step, an SNR-dependent phasespectrum compensation is used to compensate the phase. For noisy speeches with medium to low levels of SNR,13
Time(s) (a)
Time(s) (b)
Time(s) (c)
Time(s) (d)
Time(s) (e)
Time(s) (f)Figure 10: Spectrograms of (a) clean signal (b) noisy signal with 10 dB babble noise; spectrograms of enhanced speech from (c) MBSS (d) PSC(e) SMPO (f) NSSP. simulation results show that the proposed method yields consistently better results in the sense of higher segmentalSNR improvement, overall SNR improvement, and output PESQ than those of the existing speech enhancementmethods. [1] S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on acoustics, speech, and signal processing27 (2) (1979) 113–120.
2] K. Yamashita, T. Shimamura, Nonstationary noise estimation using low-frequency regions for spectral subtraction, IEEE Signal processingletters 12 (6) (2005) 465–468.[3] Y. Lu, P. C. Loizou, A geometric approach to spectral subtraction, Speech communication 50 (6) (2008) 453–466.[4] M. T. Islam, C. Shahnaz, S. Fattah, Speech enhancement based on a modified spectral subtraction method, in: 2014 IEEE 57th InternationalMidwest Symposium on Circuits and Systems (MWSCAS), IEEE, 2014, pp. 1085–1088.[5] M. T. Islam, A. B. Hussain, K. T. Shahid, U. Saha, C. Shahnaz, Speech enhancement based on noise compensated magnitude spectrum, in:Informatics, Electronics Vision (ICIEV), 2014 International Conference on, IEEE, 2014, pp. 1–5.[6] Y. Ephraim, H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on speech and audio processing 3 (4)(1995) 251–266.[7] Y. Hu, P. C. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Transactions on Speech andAudio Processing 11 (4) (2003) 334–341.[8] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions onAcoustics, Speech, and Signal Processing 33 (2) (1985) 443–445.[9] Y. Hu, P. C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms, Speech communication 49 (7) (2007) 588–601.[10] D. L. Donoho, De-noising by soft-thresholding, IEEE transactions on information theory 41 (3) (1995) 613–627.[11] M. Bahoura, J. Rouat, Wavelet speech enhancement based on the teager energy operator, IEEE Signal Process. Lett. 8 (2001) 10–12.[12] Y. Ghanbari, M. Mollaei, A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets, SpeechCommun. 48 (2006) 927–940.[13] M. T. Islam, C. Shahnaz, W.-P. Zhu, M. O. Ahmad, Speech enhancement based on student modeling of teager energy operated perceptualwavelet packet coe cients and a custom thresholding function, IEEE ACM Transactions on Audio, Speech, and Language Processing 23 (11)(2015) 1800–1811.[14] N. Ma, M. Bouchard, R. A. Goubran, Speech enhancement using a masking threshold constrained kalman filter and its heuristic implemen-tations, IEEE Transactions on Audio, Speech, and Language Processing 14 (1) (2006) 19–32.[15] K. Paliwal, Estimation of noise variance from the noisy ar signal and its application in speech enhancement, IEEE Transactions on Acoustics,Speech, and Signal Processing 36 (2) (1988) 292–294.[16] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Transactions on speech andaudio processing 9 (5) (2001) 504–512.[17] J. Yamauchi, T. Shimamura, Noise estimation using high frequency regions for spectral subtraction, IEICE TRANSACTIONS on Fundamen-tals of Electronics, Communications and Computer Sciences 85 (3) (2002) 723–727.[18] K. W´ojcicki, M. Milacic, A. Stark, J. Lyons, K. Paliwal, Exploiting conjugate symmetry of the short-time fourier spectrum for speechenhancement, IEEE Signal processing letters 15 (2008) 461–464.[19] A. P. Stark, K. K. W´ojcicki, J. G. Lyons, K. K. Paliwal, K. K. Paliwal, Noise driven short-time phase spectrum compensation procedure forspeech enhancement., in: INTERSPEECH, 2008, pp. 549–552.[20] M. T. Islam, C. Shahnaz, Speech enhancement based on noise-compensated phase spectrum, in: Electrical Engineering and Information &Communication Technology (ICEEICT), 2014 International Conference on, IEEE, 2014, pp. 1–5.[21] D. O’shaughnessy, Speech communication: human and machine, Universities press, 1987.[22] S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in: IEEE internationalconference on acoustics speech and signal processing, Vol. 4, Citeseer, 2002, pp. 4164–4164.[23] Y. Lu, P. C. Loizou, Estimators of the magnitude-squared spectrum and methods for incorporating snr uncertainty, IEEE transactions onaudio, speech, and language processing 19 (5) (2011) 1123–1137.[24] Y. Hu, P. C. Loizou, Evaluation of objective quality measures for speech enhancement, IEEE Transactions on audio, speech, and languageprocessing 16 (1) (2008) 229–238.[25] Y. Hu, P. Loizou, Subjective comparison and evaluation of speech enhancement algorithms, Speech Commun. 49 (2007) 588–601.[26] ITU, P835 IT: subjective test methodology for evaluating speech communication systems that include noise suppression algorithms., ITU-TRecommendation (ITU, Geneva) (2003) 835.2] K. Yamashita, T. Shimamura, Nonstationary noise estimation using low-frequency regions for spectral subtraction, IEEE Signal processingletters 12 (6) (2005) 465–468.[3] Y. Lu, P. C. Loizou, A geometric approach to spectral subtraction, Speech communication 50 (6) (2008) 453–466.[4] M. T. Islam, C. Shahnaz, S. Fattah, Speech enhancement based on a modified spectral subtraction method, in: 2014 IEEE 57th InternationalMidwest Symposium on Circuits and Systems (MWSCAS), IEEE, 2014, pp. 1085–1088.[5] M. T. Islam, A. B. Hussain, K. T. Shahid, U. Saha, C. Shahnaz, Speech enhancement based on noise compensated magnitude spectrum, in:Informatics, Electronics Vision (ICIEV), 2014 International Conference on, IEEE, 2014, pp. 1–5.[6] Y. Ephraim, H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on speech and audio processing 3 (4)(1995) 251–266.[7] Y. Hu, P. C. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Transactions on Speech andAudio Processing 11 (4) (2003) 334–341.[8] Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions onAcoustics, Speech, and Signal Processing 33 (2) (1985) 443–445.[9] Y. Hu, P. C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms, Speech communication 49 (7) (2007) 588–601.[10] D. L. Donoho, De-noising by soft-thresholding, IEEE transactions on information theory 41 (3) (1995) 613–627.[11] M. Bahoura, J. Rouat, Wavelet speech enhancement based on the teager energy operator, IEEE Signal Process. Lett. 8 (2001) 10–12.[12] Y. Ghanbari, M. Mollaei, A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets, SpeechCommun. 48 (2006) 927–940.[13] M. T. Islam, C. Shahnaz, W.-P. Zhu, M. O. Ahmad, Speech enhancement based on student modeling of teager energy operated perceptualwavelet packet coe cients and a custom thresholding function, IEEE ACM Transactions on Audio, Speech, and Language Processing 23 (11)(2015) 1800–1811.[14] N. Ma, M. Bouchard, R. A. Goubran, Speech enhancement using a masking threshold constrained kalman filter and its heuristic implemen-tations, IEEE Transactions on Audio, Speech, and Language Processing 14 (1) (2006) 19–32.[15] K. Paliwal, Estimation of noise variance from the noisy ar signal and its application in speech enhancement, IEEE Transactions on Acoustics,Speech, and Signal Processing 36 (2) (1988) 292–294.[16] R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Transactions on speech andaudio processing 9 (5) (2001) 504–512.[17] J. Yamauchi, T. Shimamura, Noise estimation using high frequency regions for spectral subtraction, IEICE TRANSACTIONS on Fundamen-tals of Electronics, Communications and Computer Sciences 85 (3) (2002) 723–727.[18] K. W´ojcicki, M. Milacic, A. Stark, J. Lyons, K. Paliwal, Exploiting conjugate symmetry of the short-time fourier spectrum for speechenhancement, IEEE Signal processing letters 15 (2008) 461–464.[19] A. P. Stark, K. K. W´ojcicki, J. G. Lyons, K. K. Paliwal, K. K. Paliwal, Noise driven short-time phase spectrum compensation procedure forspeech enhancement., in: INTERSPEECH, 2008, pp. 549–552.[20] M. T. Islam, C. Shahnaz, Speech enhancement based on noise-compensated phase spectrum, in: Electrical Engineering and Information &Communication Technology (ICEEICT), 2014 International Conference on, IEEE, 2014, pp. 1–5.[21] D. O’shaughnessy, Speech communication: human and machine, Universities press, 1987.[22] S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in: IEEE internationalconference on acoustics speech and signal processing, Vol. 4, Citeseer, 2002, pp. 4164–4164.[23] Y. Lu, P. C. Loizou, Estimators of the magnitude-squared spectrum and methods for incorporating snr uncertainty, IEEE transactions onaudio, speech, and language processing 19 (5) (2011) 1123–1137.[24] Y. Hu, P. C. Loizou, Evaluation of objective quality measures for speech enhancement, IEEE Transactions on audio, speech, and languageprocessing 16 (1) (2008) 229–238.[25] Y. Hu, P. Loizou, Subjective comparison and evaluation of speech enhancement algorithms, Speech Commun. 49 (2007) 588–601.[26] ITU, P835 IT: subjective test methodology for evaluating speech communication systems that include noise suppression algorithms., ITU-TRecommendation (ITU, Geneva) (2003) 835.