Dongwen Ying
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dongwen Ying.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Dongwen Ying; Yonghong Yan; Jianwu Dang; Frank K. Soong
How to construct models for speech/nonspeech discrimination is a crucial point for voice activity detectors (VADs). Semi-supervised learning is the most popular way for model construction in conventional VADs. In this correspondence, we propose an unsupervised learning framework to construct statistical models for VAD. This framework is realized by a sequential Gaussian mixture model. It comprises an initialization process and an updating process. At each subband, the GMM is firstly initialized using EM algorithm, and then sequentially updated frame by frame. From the GMM, a self-regulatory threshold for discrimination is derived at each subband. Some constraints are introduced to this GMM for the sake of reliability. For the reason of unsupervised learning, the proposed VAD does not rely on an assumption that the first several frames of an utterance are nonspeech, which is widely used in most VADs. Moreover, the speech presence probability in the time-frequency domain is a byproduct of this VAD. We tested it on speech from TIMIT database and noise from NOISEX-92 database. The evaluations effectively showed its promising performance in comparison with VADs such as ITU G.729B, GSM AMR, and a typical semi-supervised VAD.
PLOS ONE | 2015
Jie Liu; Dongwen Ying; William Z. Rymer; Ping Zhou
Accurate muscle activity onset detection is an essential prerequisite for many applications of surface electromyogram (EMG). This study presents an unsupervised EMG learning framework based on a sequential Gaussian mixture model (GMM) to detect muscle activity onsets. The distribution of the logarithmic power of EMG signal was characterized by a two-component GMM in each frequency band, in which the two components respectively correspond to the posterior distribution of EMG burst and non-burst logarithmic powers. The parameter set of the GMM was sequentially estimated based on maximum likelihood, subject to constraints derived from the relationship between EMG burst and non-burst distributions. An optimal threshold for EMG burst/non-burst classification was determined using the GMM at each frequency band, and the final decision was obtained by a voting procedure. The proposed novel framework was applied to simulated and experimental surface EMG signals for muscle activity onset detection. Compared with conventional approaches, it demonstrated robust performance for low and changing signal to noise ratios in a dynamic environment. The framework is applicable for real-time implementation, and does not require the assumption of non EMG burst in the initial stage. Such features facilitate its practical application.
IEEE Signal Processing Letters | 2013
Dongwen Ying; Yonghong Yan
Heavy computational load and acoustic interferences are two major problems to speech source localization in real applications. Conventional methods can mitigate one problem, but deteriorate the other. This letter proposes an algorithm of direction-of-arrival (DOA) estimation, which is both computationally efficient and robust in the presence of acoustic interferences. The robustness is considered in two aspects. One is the eigenanalysis-based enhancement to reduce acoustic interferences such as noise and reverberation. The other is the coefficients that weight the pairwise time delays to mitigate the effect of delay outliers on DOA. The high computational efficiency is achieved by making use of a concave cost function, from which, the optimal estimate of DOA is given by a closed-form solution. The grid-search method often adopted in conventional algorithms is no longer used in this algorithm. We conduct some experiments in both simulated and real environments with a 9-element circular array. The proposed algorithm runs about ten times faster than Steered Response Power PHAse Transform (SRP-PHAT), and outperforms SRP-PHAT in terms of robustness.
Journal of Biomechanics | 2015
Jie Liu; Dongwen Ying; William Z. Rymer
The purpose of this study was to quantify muscle activity in the time-frequency domain, therefore providing an alternative tool to measure muscle activity. This paper presents a novel method to measure muscle activity by utilizing EMG burst presence probability (EBPP) in the time-frequency domain. The EMG signal is grouped into several Mel-scale subbands, and the logarithmic power sequence is extracted from each subband. Each log-power sequence can be regarded as a dynamic process that transits between the states of EMG burst and non-burst. The hidden Markov model (HMM) was employed to elaborate this dynamic process since HMM is intrinsically advantageous in modeling the temporal correlation of EMG burst/non-burst presence. The EBPP was eventually yielded by HMM based on the criterion of maximum likelihood. Our approach achieved comparable performance with the Bonato method.
Medical Engineering & Physics | 2014
Jie Liu; Dongwen Ying; Ping Zhou
Voluntary surface electromyogram (EMG) signals from neurological injury patients are often corrupted by involuntary background interference or spikes, imposing difficulties for myoelectric control. We present a novel framework to suppress involuntary background spikes during voluntary surface EMG recordings. The framework applies a Wiener filter to restore voluntary surface EMG signals based on tracking a priori signal to noise ratio (SNR) by using the decision-directed method. Semi-synthetic surface EMG signals contaminated by different levels of involuntary background spikes were constructed from a database of surface EMG recordings in a group of spinal cord injury subjects. After the processing, the onset detection of voluntary muscle activity was significantly improved against involuntary background spikes. The magnitude of voluntary surface EMG signals can also be reliably estimated for myoelectric control purpose. Compared with the previous sample entropy analysis for suppressing involuntary background spikes, the proposed framework is characterized by quick and simple implementation, making it more suitable for application in a myoelectric control system toward neurological injury rehabilitation.
international conference on acoustics, speech, and signal processing | 2016
Zhaoqiong Huang; Ge Zhan; Dongwen Ying; Yonghong Yan
Spatial aliasing and spatial resolution are the two issues faced by most multiple speech source localization methods. The histogram of time delays is a simple but effective method to deal with these two issues on linear arrays. But few methods were capable of applying the time delay histogram to directional-of-arrivals (DOAs) estimation using a planar array. This paper proposes a novel method to estimate DOAs of multiple speech sources based on time delay histograms across all microphones of a planar array. The pairwise time delays of different sources are firstly obtained from each time delay histogram, and then, the time delays are identified with variant speech sources. Eventually, the DOA of each source is estimated by regression over its associated time delays. We conducted some experiments in both simulated and real environments to evaluate the proposed method using an eight-element circular array. The experimental results confirmed not only its high computational efficiency, but also its superiority in spatial resolution and spatial anti-aliasing.
Journal of the Acoustical Society of America | 2014
Junfeng Li; Risheng Xia; Dongwen Ying; Yonghong Yan; Masato Akagi
Many objective measures have been reported to predict speech intelligibility in noise, most of which were designed and evaluated with English speech corpora. Given the different perceptual cues used by native listeners of different languages, examining whether there is any language effect when the same objective measure is used to predict speech intelligibility in different languages is of great interest, particularly when non-linear noise-reduction processing is involved. In the present study, an extensive evaluation is taken of objective measures for speech intelligibility prediction of noisy speech processed by noise-reduction algorithms in Chinese, Japanese, and English. Of all the objective measures tested, the short-time objective intelligibility (STOI) measure produced the most accurate results in speech intelligibility prediction for Chinese, while the normalized covariance metric (NCM) and middle-level coherence speech intelligibility index ( CSIIm) incorporating the signal-dependent band-importance functions (BIFs) produced the most accurate results for Japanese and English, respectively. The objective measures that performed best in predicting the effect of non-linear noise-reduction processing in speech intelligibility were found to be the BIF-modified NCM measure for Chinese, the STOI measure for Japanese, and the BIF-modified CSIIm measure for English. Most of the objective measures examined performed differently even under the same conditions for different languages.
Journal of Neural Engineering | 2014
Jie Liu; Dongwen Ying; William Z. Rymer; Ping Zhou
OBJECTIVE After neurological injuries such as spinal cord injury, voluntary surface electromyogram (EMG) signals recorded from affected muscles are often corrupted by interferences, such as spurious involuntary spikes and background noises produced by physiological and extrinsic/accidental origins, imposing difficulties for signal processing. Conventional methods did not well address the problem caused by interferences. It is difficult to mitigate such interferences using conventional methods. The aim of this study was to develop a subspace-based denoising method to suppress involuntary background spikes contaminating voluntary surface EMG recordings. APPROACH The Karhunen-Loeve transform was utilized to decompose a noisy signal into a signal subspace and a noise subspace. An optimal estimate of EMG signal is derived from the signal subspace and the noise power. Specifically, this estimator is capable of making a tradeoff between interference reduction and signal distortion. Since the estimator partially relies on the estimate of noise power, an adaptive method was presented to sequentially track the variation of interference power. The proposed method was evaluated using both semi-synthetic and real surface EMG signals. MAIN RESULTS The experiments confirmed that the proposed method can effectively suppress interferences while keep the distortion of voluntary EMG signal in a low level. The proposed method can greatly facilitate further signal processing, such as onset detection of voluntary muscle activity. SIGNIFICANCE The proposed method can provide a powerful tool for suppressing background spikes and noise contaminating voluntary surface EMG signals of paretic muscles after neurological injuries, which is of great importance for their multi-purpose applications.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Dongwen Ying; Yonghong Yan
The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. In theory, hidden Markov models (HMMs) are superior to this technique in modeling temporal correlation. HMMs can model a time sequence of presence/absence of speech signal as a dynamic process of the transition between speech and non-speech states. Moreover, a number of methods, such as maximum likelihood, are available for optimal estimation of HMM parameters. This paper presents a constrained sequential HMM for modeling the log-power sequence on each frequency band. The emission probability of each HMM state is represented by a Gaussian model. The Gaussian mean of the non-speech state is considered as the optimal estimate of noise logarithmic power. The HMM parameter set is sequentially estimated from one frame to another on the basis of maximum likelihood. The proposed method is compared with well-established algorithms through various experiments. Our method delivers more accurate results and does not rely on the assumption of the “non-speech signal onset” as do most algorithms.
ieee global conference on signal and information processing | 2015
Dongwen Ying; Ge Zhan; Zhaoqiong Huang; Yonghong Yan; Fei Li
The sparsity-based methods are widely used to localize multiple speech sources because of its high computational efficiency. But spatial aliasing is a challenging issue for sparsity-based speech source localization. For a pair of widely spaced microphones, there may be several candidates of time delays corresponding to a given phase difference in some high frequencies. Especially for planar arrays, there may exist a large number of possible combinations of these time-delay candidates across all microphone pairs. The purpose of spatial de-aliasing is to determine the number of aliasing periods, and select the most optimal combination from those aliasing combinations. This paper proposes a closed-form method of spatial de-aliasing for planar arrays. The convex cost function is defined as the weighted error function of the numbers of aliasing periods. The solutions to the numbers of aliasing periods is given by minimizing the cost function. The proposed method was evaluated in a simulated environment. The experimental results confirmed that the proposed method can well treat spatial aliasing.
Collaboration
Dive into the Dongwen Ying's collaboration.
National Institute of Information and Communications Technology
View shared research outputs