Shuang Xu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shuang Xu is active.

Explore More

Publication

Featured researches published by Shuang Xu.

conference of the international speech communication association | 2016

Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling.

Yuanyuan Zhao; Shuang Xu; Bo Xu

Theoretical and empirical evidences indicate that the depth of neural networks is crucial to acoustic modeling in speech recognition tasks. Unfortunately, the situation in practice always is that with the depth increasing, the accuracy gets saturated and then degrades rapidly. In this paper, a novel multidimensional residual learning architecture is proposed to address this degradation of deep recurrent neural networks (RNNs) on acoustic modeling by further exploring the spatial and temporal dimensions. In the spatial dimension, shortcut connections are introduced to RNNs, along which the information can flow across several layers without attenuation. In the temporal dimension, we cope with the degradation problem by regulating temporal granularity, namely, splitting the input sequence into several parallel sub-sequences, which can ensure information flowing across the time axis unimpededly. Finally, we place a row convolution layer on the top of all recurrent layers to comprehend appropriate information from several parallel sub-sequences to feed to the classifier. Experiments are illustrated on two quite different speech recognition tasks and 10% relative performance improvements are observed.

international symposium on neural networks | 2017

A class-specific copy network for handling the rare word problem in neural machine translation

Feng Wang; Wei Chen; Zhen Yang; Xiaowei Zhang; Shuang Xu; Bo Xu

Neural machine translation (NMT) has shown promising results and rapidly gained adoption in many large-scale settings. With the NMT model being widely used in empirical productions, its long-standing weakness in handling the rare and out of vocabulary words has been amplified a lot. In order to release the model from the stress of “understanding” the rare words, copy mechanism has been proposed to deal with the rare and unseen words for the neural network models using attention. However the negative side of the copy mechanism is that the model is only able to decide whether to copy or not. It is unable to detect which class should the rare word be copied to, such as person, location, and organization. This paper deeply investigates this limitation of the NMT model. As a result, we propose a new NMT model by novelly incorporating a class-specific copy network. With the network, the proposed NMT model is able to decide which class the words in the target belong to and which class in the source should be copied to. Experimental results on Chinese-English translation tasks show that the proposed model outperforms the traditional NMT model with a large margin especially for sentences containing the rare words.

international conference on neural information processing | 2017

Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling

Yuanyuan Zhao; Shiyu Zhou; Shuang Xu; Bo Xu

Recently, the RNN-based acoustic model has shown promising performance. However, its generalization ability to multiple scenarios is not powerful enough for two reasons. Firstly, it encodes inter-word dependency, which conflicts with the nature that an acoustic model should model the pronunciation of words only. Secondly, the RNN-based acoustic model depicting the inner-word acoustic trajectory frame-by-frame is too precise to tolerate small distortions. In this work, we propose two variants to address aforementioned two problems. One is the word-level permutation, i.e. the order of input features and corresponding labels is shuffled with a proper probability according to word boundaries. It aims to eliminate inter-word dependencies. The other one is the improved LFR (iLFR) model, which equidistantly splits the original sentence into N utterances to overcome the discarding data in LFR model. Results based on LSTM RNN demonstrate 7% relative performance improvement by jointing the word-level permutation and iLFR.

international symposium on chinese spoken language processing | 2016

Investigating gated recurrent neural networks for acoustic modeling

Yuanyuan Zhao; Jie Li; Shuang Xu; Bo Xu

Recurrent neural networks (RNNs) with a gating mechanism have been shown to give state-of-the-art performance in acoustic modeling, such as gated recurrent unit (GRU), long short-term memory (LSTM), long short-term memory projected (L-STMP), etc. But little is known about why these gated RNNs work and what the differences are among these networks. Based on a series of experimental comparison and analysis, we find that: a) GRU usually performs better than LSTM, for possibly GRU is able to modulate the previous memory content through the learned reset gates, helping to model the long-span dependence more efficiently for speech sequence; b) LSTMP shows comparable performance with GRU, since LSTMP has the similar ability of information selection and combination by an automatic learned linear transformation in a weight-sharing way. In experiments, a visual analysis method is adopted to understand the historical information selection mechanism in RNNs in contrast to DNN. Experimental results on three different speech recognition tasks demonstrate the above conclusions and 5%–13% relative PER or CER reduction is observed.

conference of the international speech communication association | 2018