[PDF] Investigation of Phase Distortion on Perceived Speech Quality for Hearing-impaired Listeners

Abstract

Phase serves as a critical component of speech that influences the quality and intelligibility. Current speech enhancement algorithms are beginning to address phase distortions, but the algorithms focus on normal-hearing (NH) listeners. It is not clear whether phase enhancement is beneficial for hearing-impaired (HI) listeners. We investigated the influence of phase distortion on speech quality through a listening study, in which NH and HI listeners provided speech-quality ratings using the MUSHRA procedure. In one set of conditions, the speech was mixed with babble noise at 4 different signal-to-noise ratios (SNRs) from -5 to 10 dB. In another set of conditions, the SNR was fixed at 10 dB and the noisy speech was presented in a simulated reverberant room with T60s ranging from 100 to 1000 ms. The speech level was kept at 65 dB SPL for NH listeners and amplification was applied for HI listeners to ensure audibility. Ideal ratio masking (IRM) was used to simulate speech enhancement. Two objective metrics (i.e., PESQ and HASQI) were utilized to compare subjective and objective ratings. Results indicate that phase distortion has a negative impact on perceived quality for both groups and PESQ is more closely correlated with human ratings.

Full PDF

IInvestigation of Phase Distortion on Perceived Speech Quality forHearing-impaired Listeners

Zhuohuang Zhang , , Donald S. Williamson , Yi Shen Department of Speech, Language and Hearing Sciences, Indiana University, USA Department of Computer Science, Indiana University, USA [email protected], { williads, shen2 } @indiana.edu Abstract

Phase serves as a critical component of speech that inﬂuencesthe quality and intelligibility. Current speech enhancement al-gorithms are beginning to address phase distortions, but the al-gorithms focus on normal-hearing (NH) listeners. It is not clearwhether phase enhancement is beneﬁcial for hearing-impaired(HI) listeners. We investigated the inﬂuence of phase distortionon speech quality through a listening study, in which NH and HIlisteners provided speech-quality ratings using the MUSHRAprocedure. In one set of conditions, the speech was mixed withbabble noise at 4 different signal-to-noise ratios (SNRs) from-5 to 10 dB. In another set of conditions, the SNR was ﬁxed at10 dB and the noisy speech was presented in a simulated re-verberant room with T60s ranging from 100 to 1000 ms. Thespeech level was kept at 65 dB SPL for NH listeners and ampli-ﬁcation was applied for HI listeners to ensure audibility. Idealratio masking (IRM) was used to simulate speech enhancement.Two objective metrics (i.e., PESQ and HASQI) were utilized tocompare subjective and objective ratings. Results indicate thatphase distortion has a negative impact on perceived quality forboth groups and PESQ is more closely correlated with humanratings.

Index Terms : phase distortion, speech quality perception, ob-jective metric, human listening study

1. Introduction

Phase is a critical component of a speech signal, and itmakes important contributions to speech intelligibility and per-ceived quality. When analyzing a speech signal in the time-frequency (T-F) domain by performing short-time Fourier trans-form (STFT) analysis, each time-by-frequency location in theresulting spectrogram contains not only magnitude but alsophase information. At each time frame, the phase spectrum rep-resents how the various frequency components in the speechsignal are temporally aligned. At each frequency, the progres-sion of phase across consecutive time frames represents the tem-poral ﬁne structure (TFS) of the speech signal, which is impor-tant to the perception of talker gender, voicing, and intonation[1]. Replacing or distorting the phase information when recon-structing speech from the spectrogram would lead to degradedspeech intelligibility [2].Many of the existing speech enhancement systems only op-erate based on the magnitude spectrogram and keep the noisyphase unchanged when converting the enhanced speech to thetime domain [3, 4, 5]. Recently, a number of studies have shownthat better phase estimation of the original speech improves bothsubjective and objective speech quality [6, 7]. However, thesestudies addressing the importance of phase in speech enhance-ment have only focused on normal-hearing (NH) listeners. Ap-proximately one third of older adults above the age of 65 years in the United States suffer from hearing loss [8]. Modern digitalhearing aids, besides amplifying the acoustic signals, also con-sist of built-in speech enhancement algorithms to remove un-wanted background noise that corrupts the speech [9, 10]. Thisenhancement is performed before ampliﬁcation. Before thephase-preserving speech enhancement algorithms can be im-plemented into hearing aids, it is necessary to evaluate whetherhearing-impaired (HI) individuals would actually beneﬁt fromthem.It is known that HI listeners have poorer sensitivity to theTFS [11, 12] and beneﬁt less from TFS cues for speech under-standing [13, 14]. Therefore, it may be expected that preserv-ing the phase information in speech enhancement may not leadto the same degree of beneﬁt for HI listeners compared to NHlisteners. In order to optimize speech enhancement algorithmsfor the HI population, it is crucial to quantify their sensitivity tophase distortions, especially if the phase distortions remained inthe enhanced speech following phase-insensitive enhancement.If HI listeners consistently rate the phase-distorted speech ashaving lower quality, even after phase-insensitive enhancement,then it would suggest the potential beneﬁt for a phase-sensitivesystem. Furthermore, objective speech-quality metrics devel-oped for NH listeners may not directly generalize to HI listeners[15]. Therefore, it is not clear whether existing speech-qualitymetrics could be adequately used to capture the effect of phasedistortion for HI listeners.To address these open questions, we collected human qual-ity ratings on noisy speech signals with different degrees ofphase distortion (or equivalently distortion to the TFS) for bothNH and HI listeners using the MUltiple Stimuli with HiddenReference and Anchor (MUSHRA) procedure [16]. The qual-ity ratings were repeated on speech signals with and withoutprocessing from a common phase-insensitive algorithm basedon an ideal ratio mask (IRM) in the T-F domain [3]. This al-lowed us to investigate whether the perceived quality by NHand HI listeners would be adversely affected if phase distor-tion remained in the enhanced speech following traditional,magnitude-based enhancement. Comparing the ratings fromthese two conditions also reveals the expected beneﬁts from aphase-insensitive speech enhancement system. To assess theagreement between subjective and objective speech quality, thesubjective quality ratings were compared against two objectivemetrics, namely perceptual evaluation of speech quality (PESQ)[17] and hearing-aid speech quality index (HASQI) [15].In the following, the implementation of phase distortion andthe procedure for the subjective listening test are described inSection 2. In Section 3, the results obtained from the listeningtest and their correlations to the objective metrics are presented.Finally, conclusions are drawn in Section 4. a r X i v : . [ ee ss . A S ] J u l . Methods Phase distortion was artiﬁcially applied to the speech materialsby introducing random perturbations to the phase spectrogramusing the following steps. First, both the magnitude spectro-gram [ | s ( t, f ) | ] and the phase spectrogram [ ∠ s ( t, f ) ] in the T-F domain were extracted. A window size of 25 ms with a stepsize of 10 ms were used during STFT analysis. Second, fourdegrees (i.e., 25%, 50%, 75%, and 100%) of phase distortionwere applied to the phase spectrogram according to ∠ s ( t, f ) distorted = ∠ s ( t, f ) + α · φ ( t, f ) , (1)where ∠ s ( t, f ) distorted denotes the distorted phase in the T-F do-main; α denotes the amount of phase distortion ranging from25% to 100%; and φ ( t, f ) represents random phase perturba-tions drawn from a uniform distribution between 0 and π , in-dependently for each T-F location. Similar distortion amountswere used in earlier studies involving NH listeners [18, 19, 20].These amounts also reﬂect potential errors due to inaccuratephase enhancement. Finally, the phase-distorted speech wasresynthesized with the inverse STFT that combines the originalmagnitude spectrogram and the distorted phase spectrogram. A total of 18 participants were recruited, including 10 NH lis-teners (4 males, 6 females, recruited from the undergraduatepopulation at Indiana University) and 8 HI listeners (3 males,5 females, average age: 68 ( SD = 5 . )). All participantswere native speakers of American English. Audiometric eval-uations were performed on all NH listeners, including oto-scopy and air-conduction pure-tone audiometry. All NH lis-teners had audiometric thresholds below 20 dB HL from 250 to6000 Hz. For the HI listeners, the hearing evaluation addition-ally included bone-conduction audiometry, tympanometry, andhearing-related case history. All HI listeners had at least mildsymmetric hearing loss of a sensorineural origin. The averageaudiometric thresholds for the two groups are listed in Table 1.The current study was conducted following the Declaration ofHelsinki and approved by the Institutional Review Board (IRB)at Indiana University. Informed consents were obtained fromall participants before the data collection. Speech utterances from the IEEE corpus [21] produced by afemale talker were adopted for the listening test. To ensurespeech audibility for HI listeners, a standard hearing-aid pre-scription formula (i.e., NAL-R) [22] was used to amplify thespeech stimuli. This formula prescribes linear gains for variousfrequency regions according to the degrees of hearing loss inthese regions. The speech level was calibrated to 65 dB SPLbefore ampliﬁcation. All signals were resampled to 16 kHz be-fore further processing.There were four test conditions in the listening test. In theNoisy condition, the speech stimuli was presented with a simul-taneous 10-talker babble from the AzBio database [23] at 4 dif-ferent signal-to-noise ratios (SNRs), from -5 dB to 10 dB witha 5 dB step. In the Noisy-Enhanced condition, the stimuli werethe same as those in the Noisy condition except that they werefurther masked by the IRM [3] before presentation. In the Re-verberant condition, the SNR between the speech and noise wasﬁxed at 10 dB and the stimuli were presented with simulated re-verberation [24]. The reverberation algorithm simulated a room Table 1:

Average auditory thresholds of participants from NHand HI groups with the standard deviations in parentheses.

Average Auditory Thresholds (dB HL)Frequency (kHz) .25 .5 1 2 4 6 NH HI × ×

3m (length × width × height), thesound source was located at (2m, 3.5m, 2m), and the listenerwas located at (2m, 1.5m, 2m). The sound velocity was as-sumed to be 340 m/s. The reverberation times (T60) were 100,200, 500, and 1000 ms. In the Reverbrant-Enhanced condition,the stimuli were the same as those in the Reverbrant conditionexcept that they were further masked by the IRM before pre-sentation. Note that the IRM in this condition was applied onlyon the noise without removing reverberation, which resembles asystem that is not trained on reverberant data. The noisy and re-verberant conditions (with and without enhancement) were cho-sen since they follow but extend a similar approach that studiedthe importance of phase for NH listeners [6].Since most HI listeners have high-frequency hearing loss,all stimuli were low-pass ﬁltered at 4 kHz. This ensures thatthe perceived speech quality is not dominated by hearing lossat high-frequency bands. All stimuli were presented monotoni-cally in the participants’ better ear based on the hearing screen-ing results. A 24-bit soundcard (Microbook II, Mark of the Uni-corn, Inc.) and a pair of headphones (HD280 Pro, Sennheiserelectronic GmbH and Co. KG) were used. The participantswere seated in a sound-attenuating booth during the study. Listeners provided subjective ratings on the stimuli followingthe MUSHRA procedure, recommended in ITU-R BS.1534[16]. During the experiment, a graphic user interface (GUI)was shown on a computer screen in front of the listener. Theuser interface contained a “Reference” button that correspondedto a reference stimulus, which was the original clean speechlow-pass ﬁltered at 4 kHz. The six additional buttons represent-ing the six test stimuli, which were six versions of the samesentence stimulus as the reference. One of these button corre-sponded to the reference stimulus; one button corresponded toa hidden anchor stimulus, which was the original clean speechlow-pass ﬁltered at 2 kHz; the remaining four buttons cor-responded to the phase-distorted speech with four degrees ofphase distortion. The correspondence between the buttons andthe stimuli was randomized from trial to trial. On each trial, thelistener clicked on each of the buttons to hear the correspondingspeech stimulus and rate the quality of the stimulus on a scalefrom 1 to 100 using a slider next to the button. The listenerwas instructed that the quality of the reference stimulus corre-sponded to a rating of “100”. The listener was able to play thereference and test stimuli more than once.The experiment was completed in two 2-hour sessions. Forhalf of the listeners in each of the two listener groups, theNoisy and Noisy-Enhanced conditions were tested in the ﬁrstsession while the Reverberant and Reverberant-Enhanced con-ditions were tested in the second session. For the other halfof the listeners, the test sequence for the two sessions was re-versed. At the beginning of each session, eight practice trialswere run to familiarize the listener with the stimuli and the GUI.f the Noisy and Noisy-Enhanced conditions were tested in thesession, the practice trials included stimuli at the four differentSNRs, with and without the IRM-based enhancement. Afterthe practice trials, the two experimental conditions were testedusing two blocks of 40 trials. The order in which the two condi-tions were tested was counterbalanced across listeners. Withineach block, 10 trials were run at each of the SNRs, in randomorder. If the Reverberant and Reverberant-Enhanced conditionswere tested in the session, the practice trials included stimuliat the four different values of T60, with and without the IRM-based enhancement. Following the practice trials, the two ex-perimental conditions were tested in blocks, with each blockcontaining 10 trials at each of the T60 values in random order.No sentence was repeated in more than one trial, leading to atotal of 176 unique sentences used in the current experiment.Due to the limited availability, one HI listener did not ﬁnish theReverberant and Reverberant-Enhanced conditions and anotherHI listener did not ﬁnish the Noisy and Noisy-Enhanced condi-tions, resulting in seven listeners in the HI group for each condi-tion. For data collected from each session, a mixed-effect anal-ysis of variance (ANOVA) was conducted to identify any signif-icant effects of listener group, speech enhancement, SNR/T60,phase distortion and any signiﬁcant interactions among them.Two objective quality metrics, PESQ and HASQI, wereadopted to further investigate the correlations between objec-tive measures and actual human ratings, especially on speechsignals with distorted phase under noisy and reverberant con-ditions. PESQ is a widely adopted metric for speech qualityassessment that gives outputs ranging from -0.5 to 4.5, whileHASQI is a more recently proposed speech quality metric thatincludes a physiologically inspired model of human auditorysystem with predicted scores ranging from 0 to 1. The inclusionof this model allows HASQI to predict the perceived quality byboth NH and HI listeners. The same stimuli (with NAL-R linearampliﬁcation and low-pass ﬁltering) presented to each subjectwere given as inputs to both evaluation metrics.

3. Results and discussion

Figs. 1 and 2 show the subjective ratings for the four dif-ferent degrees of phase distortion and the four SNRs in theNoisy and Noisy-Enhanced conditions. The error bars in-dicate ± one standard deviation. A mixed-design ANOVAshows signiﬁcant main effects of listener group [ F (1 ,

15) =6 . , p = . ], enhancement [ F (1 ,

15) = 68 . , p <. ], SNR [ F (1 . , .

1) = 90 . , p < . , Greenhouse-Geisser corrected], and phase distortion [ F (1 . , .

6) =77 . , p < . , Greenhouse-Geisser corrected]. Thereare signiﬁcant interactions between listener group and SNR[ F (1 . , .

1) = 3 . , p = . , Greenhouse-Geisser cor-rected], between enhancement and SNR [ F (1 . , .

7) =24 . , p < . , Greenhouse-Geisser corrected], between en-hancement and phase distortion [ F (1 . , .

9) = 47 . , p <. , Greenhouse-Geisser corrected], and between SNR andphase distortion [ F (9 , . , p < . ], as well as sig-niﬁcant three-way interactions among listener group, enhance-ment, and SNR [ F (1 . , .

65) = 5 . , p = . , Greenhouse-Geisser corrected] and among enhancement, SNR, and phasedistortion [ F (3 . , .

9) = 4 . , p < . , Greenhouse-Geisser corrected].For the Noisy condition (Fig. 1), higher SNRs lead to higherquality ratings and greater degrees of phase distortions lead tolower ratings for both listener groups. The effects of SNR andphase distortion also interact with each other, with stronger ef- H u m a n R a t i n g s Degree of Phase DistortionNoisy Condition-5 dB 0 dB 5 dB 10 dB

Figure 1:

Human ratings under Noisy condition, NH listenersare represented in the left block and the HI listeners are shownin the right block. H u m a n R a t i n g s Degree of Phase DistortionNoisy-Enhanced Condition-5 dB 0 dB 5 dB 10 dB

Figure 2:

Human ratings under Noisy-Enhanced condition, NHlisteners are represented in the left block and the HI listenersare shown in the right block. fects of phase distortion observed at higher SNRs. When thenoise level is high (i.e., at low SNRs), the phase distortion maybe masked by the noise and become less noticeable to the lis-teners. The HI listeners tend to give higher ratings than the NHlisteners, suggesting that they have higher tolerance for phasedistortion and background noise.For the Noisy-Enhanced condition (Fig. 2), the quality rat-ings are generally higher than those in the Noisy condition, in-dicating that the IRM-based enhancement improved perceivedquality. Contrary to the strong effect of SNR in the Noisy con-dition, the effect of SNR is not reliably observed in the Noisy-Enhanced condition across all phase distortions. This suggeststhat following phase-insensitive enhancement the contributionfrom the background noise to the quality ratings is much re-duced. Instead the quality ratings become dominated by phasedistortion especially when the distortion amount is above 25%for the enhanced speech. For both listener groups, the qual-ity rating decreases as the degree of phase distortion increases.In particular, the enhancement algorithm allows both the NHand HI listeners to better differentiate various degrees of phasedistortion at low SNRs. Therefore, a speech enhancement algo-rithm that is capable of reducing phase distortions would likelylead to beneﬁts in perceived speech quality for all listeners (withor without hearing loss), at both high and low SNRs.Figs. 3 and 4 show the subjective ratings for the fourdifferent degrees of phase distortion and the four T60 val-ues in the Reverberant and Reverberant-Enhanced conditions.The error bars indicate ± one standard deviation. A mixed-design ANOVA shows signiﬁcant main effects of listener group[ F (1 ,

15) = 13 . , p = . ], enhancement [ F (1 ,

15) = H u m a n R a t i n g s Degree of Phase DistortionReverberant Condition100 ms 200 ms 500 ms 1000 ms

Figure 3:

Human ratings under Reverberant condition, NH lis-teners are represented in the left block and the HI listeners areshown in the right block. H u m a n R a t i n g s Degree of Phase DistortionReverberant-Enhanced Condition100 ms 200 ms 500 ms 1000 ms

Figure 4:

Human ratings under Reverberant-Enhanced condi-tion, NH listeners are represented in the left block and the HIlisteners are shown in the right block. . , p = . ], reverberation [ F (1 . , .

8) = 48 . , p <. , Greenhouse-Geisser corrected], and phase distortion[ F (1 . , .

1) = 51 . , p < . , Greenhouse-Geisser cor-rected]. There are signiﬁcant interactions between enhance-ment and reverberation [ F (3 ,

45) = 12 . , p < . ], be-tween enhancement and phase distortion [ F (1 . , .

0) =9 . , p < . , Greenhouse-Geisser corrected], between rever-beration and phase distortion [ F (2 . , .

7) = 30 . , p < . ,Greenhouse-Geisser corrected], as well as signiﬁcant three-wayinteractions among enhancement, reverberation, and phase dis-tortion [ F (3 . , .

7) = 11 . , p < . , Greenhouse-Geissercorrected].For the Reverberant condition (Fig. 3), shorter reverbera-tion times lead to higher quality ratings and greater degrees ofphase distortions lead to lower ratings for both listener groups.The effects of reverberation time (T60) and phase distortion alsointeract with each other, with stronger effects of phase distor-tion observed for shorter reverberation times. It is possible thatlong reverberation times (e.g., 1000 ms) could mask the phasedistortion applied to the speech and make it less noticeable.For the Reverberant-Enhanced condition (Fig. 4), the qual-ity ratings are generally higher than those in the Reverber-ant condition, indicating that the IRM-based enhancement im-proved perceived speech quality. Following speech enhance-ment, the effect of T60 is still present for all degrees of phasedistortion. This suggests that the IRM-based enhancement,when trained using non-reverberant noise and speech, is insufﬁ-cient in removing the adverse effect of reverberation on speechquality. Moreover, following speech enhancement the effect ofphase distortion becomes stronger for short reverberation times Table 2: Pearson correlations between subjective and objectiveratings at different conditions for NH and HI groups. Highestcorrelations are marked in bold . ‘Orig.’ indicates the originalconditions before enhancement. ‘Reverb.’ stands for reverber-ant conditions.

Pearson Correlation Coefﬁcients

PESQ HASQIOrig. Enhanced Orig. EnhancedNoisy NH

4. Conclusions

We investigated the inﬂuence of phase distortion on the per-ceived speech quality in NH and HI listeners. Hearing-impairedlisteners tend to provide higher ratings for the same speech stim-ulus, corrupted by either background noise or reverberation,than NH listeners. However, the quality rating depends on thedegree of phase distortion in a similar way. Following phase-insensitive enhancement, HI and NH listeners can differentiatethe degree of phase distortion that remained in the enhancedspeech, indicating potential beneﬁts from phase-sensitive en-hancement techniques. We surmise that these HI listeners maynotice phase distortions because (1) they have good TFS sensi-tivity, or (2) TFS and phase cues are weighed higher for qualitytasks as compared to recognition tasks. Future efforts will ad-dress these points. Between two objective speech-quality met-rics, PESQ provides closer correlations to the subjective ratingsthan HASQI, especially for the enhanced speech.

5. Acknowledgements

This work was supported by the Indiana University Vice Provostfor Research through the Faculty Research Support Program(co-PIs: D. S. Williamson and Y. Shen) and by an NIH grantR01DC017988 (PI: Y. Shen). We thank James Kates for pro-viding the code for HASQI. We also thank Jillian E. Bassett,Bailey E. Henderlong, Yi Liu and Annie L.K. Main for theirassistance in data collection. . References [1] C. Lorenzi and B. C. Moore, “Role of temporal envelope and ﬁnestructure cues in speech perception: A review,” in

Proceedings ofthe International Symposium on Auditory and Audiological Re-search , vol. 1, 2007, pp. 263–272.[2] Y. Xu, M. Chen, P. LaFaire, X. Tan, and C.-P. Richter, “Distortingtemporal ﬁne structure by phase shifting and its effects on speechintelligibility and neural phase locking,”

Scientiﬁc reports , vol. 7,no. 1, pp. 1–9, 2017.[3] Y. Wang, A. Narayanan, and D. Wang, “On training targets forsupervised speech separation,”

IEEE/ACM transactions on audio,speech, and language processing , vol. 22, no. 12, pp. 1849–1858,2014.[4] F. Weninger, J. R. Hershey, J. Le Roux, and B. Schuller, “Dis-criminatively trained recurrent neural networks for single-channelspeech separation,” in . IEEE, 2014, pp. 577–581.[5] M. H. Soni, N. Shah, and H. A. Patil, “Time-frequency masking-based speech enhancement using generative adversarial network,”in

ICASSP . IEEE, 2018, pp. 5039–5043.[6] K. Paliwal, K. W´ojcicki, and B. Shannon, “The importance ofphase in speech enhancement,” speech communication , vol. 53,no. 4, pp. 465–494, 2011.[7] D. S. Williamson, Y. Wang, and D. Wang, “Complex ratio mask-ing for monaural speech separation,”

IEEE/ACM transactions onaudio, speech, and language processing , vol. 24, no. 3, pp. 483–492, 2015.[8] J. Shargorodsky, S. G. Curhan, G. C. Curhan, and R. Eavey,“Change in prevalence of hearing loss in us adolescents,”

Jama ,vol. 304, no. 7, pp. 772–778, 2010.[9] L. P. Yang and Q. J. Fu, “Spectral subtraction-based speech en-hancement for cochlear implant patients in background noise,”

JASA , vol. 117, no. 3, pp. 1001–1004, 2005.[10] Z. Zhang, Y. Shen, and D. Williamson, “Objective comparison ofspeech enhancement algorithms with hearing loss simulation,” in

ICASSP . IEEE, 2019, pp. 6845–6849.[11] E. Buss, J. W. Hall III, and J. H. Grose, “Temporal ﬁne-structurecues to speech and pure tone modulation in observers with sen-sorineural hearing loss,”

Ear and hearing , vol. 25, no. 3, pp. 242–250, 2004.[12] B. C. Moore, B. R. Glasberg, and K. Hopkins, “Frequency dis-crimination of complex tones by hearing-impaired subjects: Evi-dence for loss of ability to use temporal ﬁne structure,”

Hearingresearch , vol. 222, no. 1-2, pp. 16–27, 2006.[13] C. Lorenzi, G. Gilbert, H. Carn, S. Garnier, and B. C. Moore,“Speech perception problems of the hearing impaired reﬂect in-ability to use temporal ﬁne structure,”

Proceedings of the NationalAcademy of Sciences , vol. 103, no. 49, pp. 18 866–18 869, 2006.[14] K. Hopkins, B. C. Moore, and M. A. Stone, “Effects of moderatecochlear hearing loss on the ability to beneﬁt from temporal ﬁnestructure information in speech,”

The Journal of the AcousticalSociety of America , vol. 123, no. 2, pp. 1140–1153, 2008.[15] J. M. Kates and K. H. Arehart, “The hearing-aid speech quality in-dex (hasqi) version 2,”

Journal of the Audio Engineering Society ,vol. 62, no. 3, pp. 99–117, 2014.[16] R. S. ITU, “Recommendation bs. 1534-2: Method for the subjec-tive assessment of intermediate quality level of audio systems,”2014.[17] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra,“Perceptual evaluation of speech quality (pesq)-a new method forspeech quality assessment of telephone networks and codecs,” in , vol. 2.IEEE, 2001, pp. 749–752. [18] R. C. Mathes and R. L. Miller, “Phase effects in monauralperception,”

The Journal of the Acoustical Society of America ,vol. 19, no. 5, pp. 780–797, 1947. [Online]. Available:https://doi.org/10.1121/1.1916623[19] J. H. Craig and L. A. Jeffress, “Effect of phase on the qualityof a twocomponent tone,”

The Journal of the Acoustical Societyof America , vol. 34, no. 11, pp. 1752–1760, 1962. [Online].Available: https://doi.org/10.1121/1.1909118[20] R. Plomp and H. J. M. Steeneken, “Effect of phase on the timbreof complex tones,”

The Journal of the Acoustical Society ofAmerica , vol. 46, no. 2B, pp. 409–421, 1969. [Online]. Available:https://doi.org/10.1121/1.1911705[21] E. Rothauser, “Ieee recommended practice for speech qualitymeasurements,”

IEEE Trans. on Audio and Electroacoustics ,vol. 17, pp. 225–246, 1969.[22] D. Byrne and H. Dillon, “The national acoustic laboratories’(nal)new procedure for selecting the gain and frequency response of ahearing aid,”

Ear and hearing , vol. 7, no. 4, pp. 257–265, 1986.[23] A. J. Spahr, M. F. Dorman, L. M. Litvak, S. Van Wie, R. H. Gif-ford, P. C. Loizou, L. M. Loiselle, T. Oakes, and S. Cook, “De-velopment and validation of the azbio sentence lists,”

Ear andhearing , vol. 33, no. 1, p. 112, 2012.[24] J. B. Allen and D. A. Berkley, “Image method for efﬁcientlysimulating smallroom acoustics,”

The Journal of the AcousticalSociety of America , vol. 65, no. 4, pp. 943–950, 1979. [Online].Available: https://doi.org/10.1121/1.382599[25] A. A. Kressner, D. V. Anderson, and C. J. Rozell, “Robustness ofthe hearing aid speech quality index (hasqi),” in . IEEE, 2011, pp. 209–212.[26] M. R. Wirtzfeld, N. Pourmand, V. Parsa, and I. C. Bruce, “Pre-dicting the quality of enhanced wideband speech with a cochlearmodel,”