Dongwen Ying | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dongwen Ying is active.

Explore More

Publication

Featured researches published by Dongwen Ying.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Voice Activity Detection Based on an Unsupervised Learning Framework

Dongwen Ying; Yonghong Yan; Jianwu Dang; Frank K. Soong

How to construct models for speech/nonspeech discrimination is a crucial point for voice activity detectors (VADs). Semi-supervised learning is the most popular way for model construction in conventional VADs. In this correspondence, we propose an unsupervised learning framework to construct statistical models for VAD. This framework is realized by a sequential Gaussian mixture model. It comprises an initialization process and an updating process. At each subband, the GMM is firstly initialized using EM algorithm, and then sequentially updated frame by frame. From the GMM, a self-regulatory threshold for discrimination is derived at each subband. Some constraints are introduced to this GMM for the sake of reliability. For the reason of unsupervised learning, the proposed VAD does not rely on an assumption that the first several frames of an utterance are nonspeech, which is widely used in most VADs. Moreover, the speech presence probability in the time-frequency domain is a byproduct of this VAD. We tested it on speech from TIMIT database and noise from NOISEX-92 database. The evaluations effectively showed its promising performance in comparison with VADs such as ITU G.729B, GSM AMR, and a typical semi-supervised VAD.

PLOS ONE | 2015

Robust muscle activity onset detection using an unsupervised electromyogram learning framework.

Jie Liu; Dongwen Ying; William Z. Rymer; Ping Zhou

Accurate muscle activity onset detection is an essential prerequisite for many applications of surface electromyogram (EMG). This study presents an unsupervised EMG learning framework based on a sequential Gaussian mixture model (GMM) to detect muscle activity onsets. The distribution of the logarithmic power of EMG signal was characterized by a two-component GMM in each frequency band, in which the two components respectively correspond to the posterior distribution of EMG burst and non-burst logarithmic powers. The parameter set of the GMM was sequentially estimated based on maximum likelihood, subject to constraints derived from the relationship between EMG burst and non-burst distributions. An optimal threshold for EMG burst/non-burst classification was determined using the GMM at each frequency band, and the final decision was obtained by a voting procedure. The proposed novel framework was applied to simulated and experimental surface EMG signals for muscle activity onset detection. Compared with conventional approaches, it demonstrated robust performance for low and changing signal to noise ratios in a dynamic environment. The framework is applicable for real-time implementation, and does not require the assumption of non EMG burst in the initial stage. Such features facilitate its practical application.

IEEE Signal Processing Letters | 2013

Robust and Fast Localization of Single Speech Source Using a Planar Array

Dongwen Ying; Yonghong Yan

Heavy computational load and acoustic interferences are two major problems to speech source localization in real applications. Conventional methods can mitigate one problem, but deteriorate the other. This letter proposes an algorithm of direction-of-arrival (DOA) estimation, which is both computationally efficient and robust in the presence of acoustic interferences. The robustness is considered in two aspects. One is the eigenanalysis-based enhancement to reduce acoustic interferences such as noise and reverberation. The other is the coefficients that weight the pairwise time delays to mitigate the effect of delay outliers on DOA. The high computational efficiency is achieved by making use of a concave cost function, from which, the optimal estimate of DOA is given by a closed-form solution. The grid-search method often adopted in conventional algorithms is no longer used in this algorithm. We conduct some experiments in both simulated and real environments with a 9-element circular array. The proposed algorithm runs about ten times faster than Steered Response Power PHAse Transform (SRP-PHAT), and outperforms SRP-PHAT in terms of robustness.

Journal of Biomechanics | 2015

EMG burst presence probability: A joint time–frequency representation of muscle activity and its application to onset detection

Jie Liu; Dongwen Ying; William Z. Rymer

The purpose of this study was to quantify muscle activity in the time-frequency domain, therefore providing an alternative tool to measure muscle activity. This paper presents a novel method to measure muscle activity by utilizing EMG burst presence probability (EBPP) in the time-frequency domain. The EMG signal is grouped into several Mel-scale subbands, and the logarithmic power sequence is extracted from each subband. Each log-power sequence can be regarded as a dynamic process that transits between the states of EMG burst and non-burst. The hidden Markov model (HMM) was employed to elaborate this dynamic process since HMM is intrinsically advantageous in modeling the temporal correlation of EMG burst/non-burst presence. The EBPP was eventually yielded by HMM based on the criterion of maximum likelihood. Our approach achieved comparable performance with the Bonato method.

Medical Engineering & Physics | 2014

Wiener filtering of surface EMG with a priori SNR estimation toward myoelectric control for neurological injury patients

Jie Liu; Dongwen Ying; Ping Zhou

Voluntary surface electromyogram (EMG) signals from neurological injury patients are often corrupted by involuntary background interference or spikes, imposing difficulties for myoelectric control. We present a novel framework to suppress involuntary background spikes during voluntary surface EMG recordings. The framework applies a Wiener filter to restore voluntary surface EMG signals based on tracking a priori signal to noise ratio (SNR) by using the decision-directed method. Semi-synthetic surface EMG signals contaminated by different levels of involuntary background spikes were constructed from a database of surface EMG recordings in a group of spinal cord injury subjects. After the processing, the onset detection of voluntary muscle activity was significantly improved against involuntary background spikes. The magnitude of voluntary surface EMG signals can also be reliably estimated for myoelectric control purpose. Compared with the previous sample entropy analysis for suppressing involuntary background spikes, the proposed framework is characterized by quick and simple implementation, making it more suitable for application in a myoelectric control system toward neurological injury rehabilitation.

international conference on acoustics, speech, and signal processing | 2016

Robust multiple speech source localization using time delay histogram

Zhaoqiong Huang; Ge Zhan; Dongwen Ying; Yonghong Yan

Spatial aliasing and spatial resolution are the two issues faced by most multiple speech source localization methods. The histogram of time delays is a simple but effective method to deal with these two issues on linear arrays. But few methods were capable of applying the time delay histogram to directional-of-arrivals (DOAs) estimation using a planar array. This paper proposes a novel method to estimate DOAs of multiple speech sources based on time delay histograms across all microphones of a planar array. The pairwise time delays of different sources are firstly obtained from each time delay histogram, and then, the time delays are identified with variant speech sources. Eventually, the DOA of each source is estimated by regression over its associated time delays. We conducted some experiments in both simulated and real environments to evaluate the proposed method using an eight-element circular array. The experimental results confirmed not only its high computational efficiency, but also its superiority in spatial resolution and spatial anti-aliasing.

Journal of the Acoustical Society of America | 2014

Investigation of objective measures for intelligibility prediction of noise-reduced speech for Chinese, Japanese, and English

Junfeng Li; Risheng Xia; Dongwen Ying; Yonghong Yan; Masato Akagi

Many objective measures have been reported to predict speech intelligibility in noise, most of which were designed and evaluated with English speech corpora. Given the different perceptual cues used by native listeners of different languages, examining whether there is any language effect when the same objective measure is used to predict speech intelligibility in different languages is of great interest, particularly when non-linear noise-reduction processing is involved. In the present study, an extensive evaluation is taken of objective measures for speech intelligibility prediction of noisy speech processed by noise-reduction algorithms in Chinese, Japanese, and English. Of all the objective measures tested, the short-time objective intelligibility (STOI) measure produced the most accurate results in speech intelligibility prediction for Chinese, while the normalized covariance metric (NCM) and middle-level coherence speech intelligibility index ( CSIIm) incorporating the signal-dependent band-importance functions (BIFs) produced the most accurate results for Japanese and English, respectively. The objective measures that performed best in predicting the effect of non-linear noise-reduction processing in speech intelligibility were found to be the BIF-modified NCM measure for Chinese, the STOI measure for Japanese, and the BIF-modified CSIIm measure for English. Most of the objective measures examined performed differently even under the same conditions for different languages.

Journal of Neural Engineering | 2014

Subspace based adaptive denoising of surface EMG from neurological injury patients

Jie Liu; Dongwen Ying; William Z. Rymer; Ping Zhou

OBJECTIVE After neurological injuries such as spinal cord injury, voluntary surface electromyogram (EMG) signals recorded from affected muscles are often corrupted by interferences, such as spurious involuntary spikes and background noises produced by physiological and extrinsic/accidental origins, imposing difficulties for signal processing. Conventional methods did not well address the problem caused by interferences. It is difficult to mitigate such interferences using conventional methods. The aim of this study was to develop a subspace-based denoising method to suppress involuntary background spikes contaminating voluntary surface EMG recordings. APPROACH The Karhunen-Loeve transform was utilized to decompose a noisy signal into a signal subspace and a noise subspace. An optimal estimate of EMG signal is derived from the signal subspace and the noise power. Specifically, this estimator is capable of making a tradeoff between interference reduction and signal distortion. Since the estimator partially relies on the estimate of noise power, an adaptive method was presented to sequentially track the variation of interference power. The proposed method was evaluated using both semi-synthetic and real surface EMG signals. MAIN RESULTS The experiments confirmed that the proposed method can effectively suppress interferences while keep the distortion of voluntary EMG signal in a low level. The proposed method can greatly facilitate further signal processing, such as onset detection of voluntary muscle activity. SIGNIFICANCE The proposed method can provide a powerful tool for suppressing background spikes and noise contaminating voluntary surface EMG signals of paretic muscles after neurological injuries, which is of great importance for their multi-purpose applications.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Noise Estimation Using a Constrained Sequential Hidden Markov Model in the Log-Spectral Domain

Dongwen Ying; Yonghong Yan

The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. In theory, hidden Markov models (HMMs) are superior to this technique in modeling temporal correlation. HMMs can model a time sequence of presence/absence of speech signal as a dynamic process of the transition between speech and non-speech states. Moreover, a number of methods, such as maximum likelihood, are available for optimal estimation of HMM parameters. This paper presents a constrained sequential HMM for modeling the log-power sequence on each frequency band. The emission probability of each HMM state is represented by a Gaussian model. The Gaussian mean of the non-speech state is considered as the optimal estimate of noise logarithmic power. The HMM parameter set is sequentially estimated from one frame to another on the basis of maximum likelihood. The proposed method is compared with well-established algorithms through various experiments. Our method delivers more accurate results and does not rely on the assumption of the “non-speech signal onset” as do most algorithms.

ieee global conference on signal and information processing | 2015

A closed-form method of spatial de-aliasing for multiple speech source localization

Dongwen Ying; Ge Zhan; Zhaoqiong Huang; Yonghong Yan; Fei Li

The sparsity-based methods are widely used to localize multiple speech sources because of its high computational efficiency. But spatial aliasing is a challenging issue for sparsity-based speech source localization. For a pair of widely spaced microphones, there may be several candidates of time delays corresponding to a given phase difference in some high frequencies. Especially for planar arrays, there may exist a large number of possible combinations of these time-delay candidates across all microphone pairs. The purpose of spatial de-aliasing is to determine the number of aliasing periods, and select the most optimal combination from those aliasing combinations. This paper proposes a closed-form method of spatial de-aliasing for planar arrays. The convex cost function is defined as the weighted error function of the numbers of aliasing periods. The solutions to the numbers of aliasing periods is given by minimizing the cost function. The proposed method was evaluated in a simulated environment. The experimental results confirmed that the proposed method can well treat spatial aliasing.

Explore More