Shuai Nie
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuai Nie.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Xueliang Zhang; Hui Zhang; Shuai Nie; Guanglai Gao; Wenju Liu
Speech separation and pitch estimation in noisy conditions are considered to be a “chicken-and-egg” problem. On one hand, pitch information is an important cue for speech separation. On the other hand, speech separation makes pitch estimation easier when background noise is removed. In this paper, we propose a supervised learning architecture to solve these two problems iteratively. The proposed algorithm is based on the deep stacking network (DSN), which provides a method for stacking simple processing modules to build deep architectures. Each module is a classifier whose target is the ideal binary mask (IBM), and the input vector includes spectral features, pitch-based features and the output from the previous module. During the testing stage, we estimate the pitch using the separation results and update the pitch-based features to the next module. When embedded into the DSN, pitch estimation and speech separation each run several times. We obtain the final results from the last module. Systematic evaluations show that the proposed system results in both a high quality estimated binary mask and accurate pitch estimation and outperforms recent systems in its generalization ability.
international conference on acoustics, speech, and signal processing | 2014
Shuai Nie; Hui Zhang; Xueliang Zhang; Wenju Liu
In many present speech separation approaches, the separation task is formulated as a binary classification problem. Several classification-based approaches have been proposed and performed satisfactorily. However, they do not explicitly model the correlation in time and each time-frequency (T-F) unit is still classified individually. As we know, the speech signal has a very rich time series and temporal dynamic information that can be exploited for speech separation. In this study, we incorporate the correlation in time into classification. Compared with the previous approaches, the proposed approach achieves better separation and generalization performance by using deep stacking networks (DSN) with time series and re-threshold method.
international conference on acoustics, speech, and signal processing | 2015
Hui Zhang; Xueliang Zhang; Shuai Nie; Guanglai Gao; Wenju Liu
Pitch information is an important cue for speech separation. However, pitch estimation in noisy condition is also a task as challenging as speech separation. In this paper, we propose a supervised learning architecture which combines these two problems concisely. The proposed algorithm is based on deep stacking network (DSN) which provides a method of stacking simple processing modules in building deep architecture. In the training stage, an ideal binary mask is used as target. The input vector includes the outputs of lower module and frame-level features which consist of spectral and pitch-based features. In the testing stage, each module provides an estimated binary mask which is employed to re-estimate pitch. Then we update the pitch-based features to the next module. This procedure is embedded iteratively in DSN, and we obtain the final separation results from the last module of DSN. Systematic evaluations show that the proposed approach produces high quality estimated binary mask and outperforms recent systems in generalization.
international conference on acoustics, speech, and signal processing | 2016
Shuai Nie; Shan Liang; Hao Li; Xueliang Zhang; Zhanlei Yang; Wen Ju Liu; Li Ke Dong
The targets of speech separation, whether ideal masks or magnitude spectrograms of interest, have prominent spectro-temporal structures. These characteristics are very worthy to be exploited for speech separation, however, they are usually ignored in previous works. In this paper, we use nonnegative matrix factorization (NMF) to exploit the spectro-temporal structures of magnitude spectrograms. With nonnegative constrains, NMF can capture the basis spectra patterns of speech and noise. Then the learned basis spectra are integrated into a deep neural network (DNN) to reconstruct the magnitude spectrograms of speech and noise with their nonnegative linear combination. Using the reconstructed spectrograms, we further explore a discriminative training objective and a joint optimization framework for the proposed model. Systematic experiments show that the proposed model is competitive with the previous methods in monaural speech separation tasks.
conference of the international speech communication association | 2016
Hao Li; Shuai Nie; Xueliang Zhang; Hui Zhang
Convolutive non-negative matrix factorization (CNMF) and deep neural networks (DNN) are two efficient methods for monaural speech separation. Conventional DNN focuses on building the non-linear relationship between mixture and target speech. However, it ignores the prominent structure of the target speech. Conventional CNMF model concentrates on capturing prominent harmonic structures and temporal continuities of speech but it ignores the non-linear relationship between the mixture and target. Taking these two aspects into consideration at the same time may result in better performance. In this paper, we propose a joint optimization of DNN models with an extra CNMF layer for speech separation task. We also utilize an extra masking layer on the proposed model to constrain the speech reconstruction. Moreover, a discriminative training criterion is proposed to further enhance the performance of the separation. Experimental results show that the proposed model has significant improvement in PESQ, SAR, SIR and SDR compared with conventional methods.
Pattern Recognition Letters | 2018
Yaping Zhang; Shan Liang; Shuai Nie; Wenju Liu; Shouye Peng
Abstract Deep convolutional neural networks have made great progress in recent handwritten character recognition (HCR) by learning discriminative features from large amounts of labeled data. However, the large variance of handwriting styles across writers is still a big challenge to the robust HCR. To alleviate this issue, an intuitional idea is to extract writer-independent semantic features from handwritten characters, while standard printed characters are writer-independent stencils for handwritten characters. They could be used as prior knowledge to guide models to exploit writer-independent semantic features for HCR. In this paper, we propose a novel adversarial feature learning (AFL) model to incorporate the prior knowledge of printed data and writer-independent semantic features to improve the performance of HCR on limited training data. Different from available handcrafted features methods, the proposed AFL model exploits writer-independent semantic features automatically, and standard printed data as prior knowledge is learnt objectively. Systematic experiments on MNIST and CASIA–HWDB show that the proposed model is competitive with the state-of-the-art methods on the offline HCR task.
international conference on acoustics, speech, and signal processing | 2018
Bin Liu; Shuai Nie; Yaping Zhang; Dengfeng Ke; Shan Liang; Wenju Liu
international symposium on neural networks | 2018
Bin Liul; Shuai Nie; Shan Liang; Zhanlei Yang; Wenju Liu
conference of the international speech communication association | 2018
Shuai Nie; Shan Liang; Bin Liu; Yaping Zhang; Wenju Liu; Jianhua Tao
IEEE Transactions on Audio, Speech, and Language Processing | 2018
Shuai Nie; Shan Liang; Wenju Liu; Xueliang Zhang; Jianhua Tao