Guangji Shi
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Guangji Shi.
systems man and cybernetics | 2004
Parham Aarabi; Guangji Shi
A dual-microphone speech-signal enhancement algorithm, utilizing phase-error based filters that depend only on the phase of the signals, is proposed. This algorithm involves obtaining time-varying, or alternatively, time-frequency (TF), phase-error filters based on prior knowledge regarding the time difference of arrival (TDOA) of the speech source of interest and the phases of the signals recorded by the microphones. It is shown that by masking the TF representation of the speech signals, the noise components are distorted beyond recognition while the speech source of interest maintains its perceptual quality. This is supported by digit recognition experiments which show a substantial recognition accuracy rate improvement over prior multimicrophone speech enhancement algorithms. For example, for a case with two speakers with a 0.1 s reverberation time, the phase-error based technique results in a 28.9% recognition rate gain over the single channel noisy signal, a gain of 22.0% over superdirective beamforming, and a gain of 8.5% over postfiltering.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Guangji Shi; Maryam Modir Shanechi; Parham Aarabi
In this paper, we analyze the effects of uncertainty in the phase of speech signals on the word recognition error rate of human listeners. The motivating goal is to get a quantitative measure on the importance of phase in automatic speech recognition by studying the effects of phase uncertainty on human perception. Listening tests were conducted for 18 listeners under different phase uncertainty and signal-to-noise ratio (SNR) conditions. These results indicate that a small amount of phase error or uncertainty does not affect the recognition rate, but a large amount of phase uncertainty significantly affects the recognition rate. The degree of the importance of phase also seems to be an SNR-dependent one, such that at lower SNRs the effects of phase uncertainty are more pronounced than at higher SNRs. For example, at an SNR of -10 dB, having random phases at all frequencies results in a word error rate (WER) of 63% compared to 24% if the phase was unaltered. In comparison, at 0 dB, random phase results in a 25% WER as compared to 11% for the unaltered phase case. Listening tests were also conducted for the case of reconstructed phase based on the least square error estimation approach. The results indicate that the recognition rate for the reconstructed phase case is very close to that of the perfect phase case (a WER difference of 4% on average)
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Guangji Shi; Parham Aarabi; Hui Jiang
This paper proposes a phase-based dual-microphone speech enhancement technique that utilizes a prior speech model. Recently, it has been shown that phase-based dual-microphone filters can result in significant noise reduction in low signal-to-noise ratio [(SNR) less than 10 dB] conditions and negligible distortion at high SNRs (greater than 10 dB), as long as a correct filter parameter is chosen at each SNR. While prior work utilizes a constant parameter for all SNRs, we present an SNR-adaptive filter parameter estimation algorithm that maximizes the likelihood of the enhanced speech features based on a prior speech model. Experimental results using the CARVUI database show significant speech recognition accuracy rate improvement over alternative techniques in low SNR situations (e.g., an improvement of 11% in word error rate (WER) over postfiltering and 23% over delay-and-sum beamforming at 0 dB) and negligible distortion at high SNRs. The proposed adaptive approach also significantly outperforms the original phase-based filter with a constant parameter. Furthermore, it improves the filters robustness when there are errors in time delay estimation
international conference on acoustics, speech, and signal processing | 2003
Guangji Shi; Parham Aarabi
A technique using the time-frequency phase information of two microphones is proposed to estimate an ideal time-frequency mask using time-delay-of-arrival (TDOA) of the signal of interest. At a signal-to-noise ratio (SNR) of 0 dB, the proposed technique using two microphones achieves a digit recognition rate (average over 5 speakers, each speaking 20-30 digits) of 71%. In contrast, delay-and-sum beamforming only achieves a 40% recognition rate with two microphones and 60% with four microphones. Superdirective beamforming achieves a 44% recognition rate with two microphones and 65% with four microphones.
Archive | 2005
Parham Aarabi; Guangji Shi; Maryam Modir Shanechi; Seyed Alireza Rabi
The performance of automatic speech recognition (ASR) systems degrades significantly in adverse environments due to ambient noise and reverberation. This problem becomes even greater in hands-free speech applications, where the microphones can be placed far away from the speaker of interest. Environmental robustness has become a major barrier that prevents ASR from a wide range of applications such as voice recognition in a car and voice controlled hand-held devices. In this research, the importance of phase in robust speech recognition is explored. First, the effect of phase uncertainty on the recognition accuracy of human listeners is investigated. The goal is to get a quantitative measure on the importance of phase. The results show that the importance of phase varies with SNR (signal-to-noise ratio). At low SNR conditions, phase can have a significant impact on speech recognition accuracy. Next, motivated by the importance of phase in multi-microphone signal processing, a phase-based dual-microphone noise masking approach is proposed for speech enhancement. By utilizing the time delay of the speech source of interest to the two microphones and the actual phases of the signals recorded by both microphones, the algorithm filters the noise signal in the short-time Fourier transform domain. By doing so, the noise components are distorted beyond recognition and the speech recognition accuracy is improved. The effectiveness of this approach is demonstrated through performance comparison with alternative techniques. Lastly, an automatic parameter estimation technique is developed to further optimize its performance. The parameter of the phase-based dual-microphone filter is adjusted in run-time automatically by performing likelihood calculations of the enhanced speech features using a prior speech model. Speech recognition tests show that this adaptive approach not only achieves better recognition accuracy, but also improves the filters robustness when time delay estimates are inaccurate.
international conference on multimedia and expo | 2003
Parham Aarabi; Guangji Shi; Omid S. Jahromi
A multi-microphone time-frequency speech masking technique is proposed. This technique utilizes both the time-frequency magnitude and phase information in order to estimate the signal-to-noise ratio (SNR) maximizing masking coefficients for each time-frequency block given that the direction (or alternatively, the time-delay of arrival) of the speaker of interest is known. Using this masking algorithm, speech features (such as formants) from the direction of interest are preserved while features from other directions are severely degraded. Digit recognition experiments indicate that the proposed technique can result in a substantial increase in the digit recognition accuracy rate. At 0 dB, for example, the proposed technique results in a digit recognition accuracy rate improvement of 26% over the single microphone case and an improvement of 12% over the two microphone superdirective beamforming case.
international conference on information fusion | 2002
Parham Aarabi; Guangji Shi
This paper proposes an efficient mechanism for the fusion of two noisy speech signals obtained by an array of two microphones using single-tap time-frequency filters and by taking into account the correct time delay of arrival (TDOA) of the speech source. Speech signals obtained by the microphones are transformed into a set of two complex time-frequency (TF) images. By knowing the correct TDOA, and therefore the associated phase difference between the signals at each frequency, it is possible to non-linearly filter both the real and the imaginary parts of the TF images. This will consist of a TF reward-punish filter that adjusts the amplitude of the TF blocks based upon the variation of their phase-difference with the ideal phase-difference defined by the TDOA. Simulation results show that the proposed technique can achieve a Signal-to-Noise Ratio (SNR) improvement of 15 dB when there, is strong Gaussian noise present (-20 dB initial SNR). When the original SNR is OdB, the simulated improvement is approximately 8 dB. It is also shown that although the proposed technique is a more general case of the adaptive beamformer (where the adaptive beamformer has a specific reward-punish characteristic), other reward-punish characteristics that are proposed in this paper can often surpass the performance of the ideal adaptive beamformer.
international conference on information fusion | 2003
Guangji Shi; Parham Aarabi; N. Lazic
Archive | 2005
Parham Aarabi; Guangji Shi; Maryam Modir Shanechi; Seyed Alireza Rabi
Archive | 2005
Parham Aarabi; Guangji Shi; Maryam Modir Shanechi; Seyed Alireza Rabi