Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shuang Xu is active.

Publication


Featured researches published by Shuang Xu.


conference of the international speech communication association | 2016

Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling.

Yuanyuan Zhao; Shuang Xu; Bo Xu

Theoretical and empirical evidences indicate that the depth of neural networks is crucial to acoustic modeling in speech recognition tasks. Unfortunately, the situation in practice always is that with the depth increasing, the accuracy gets saturated and then degrades rapidly. In this paper, a novel multidimensional residual learning architecture is proposed to address this degradation of deep recurrent neural networks (RNNs) on acoustic modeling by further exploring the spatial and temporal dimensions. In the spatial dimension, shortcut connections are introduced to RNNs, along which the information can flow across several layers without attenuation. In the temporal dimension, we cope with the degradation problem by regulating temporal granularity, namely, splitting the input sequence into several parallel sub-sequences, which can ensure information flowing across the time axis unimpededly. Finally, we place a row convolution layer on the top of all recurrent layers to comprehend appropriate information from several parallel sub-sequences to feed to the classifier. Experiments are illustrated on two quite different speech recognition tasks and 10% relative performance improvements are observed.


international symposium on neural networks | 2017

A class-specific copy network for handling the rare word problem in neural machine translation

Feng Wang; Wei Chen; Zhen Yang; Xiaowei Zhang; Shuang Xu; Bo Xu

Neural machine translation (NMT) has shown promising results and rapidly gained adoption in many large-scale settings. With the NMT model being widely used in empirical productions, its long-standing weakness in handling the rare and out of vocabulary words has been amplified a lot. In order to release the model from the stress of “understanding” the rare words, copy mechanism has been proposed to deal with the rare and unseen words for the neural network models using attention. However the negative side of the copy mechanism is that the model is only able to decide whether to copy or not. It is unable to detect which class should the rare word be copied to, such as person, location, and organization. This paper deeply investigates this limitation of the NMT model. As a result, we propose a new NMT model by novelly incorporating a class-specific copy network. With the network, the proposed NMT model is able to decide which class the words in the target belong to and which class in the source should be copied to. Experimental results on Chinese-English translation tasks show that the proposed model outperforms the traditional NMT model with a large margin especially for sentences containing the rare words.


international conference on neural information processing | 2017

Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling

Yuanyuan Zhao; Shiyu Zhou; Shuang Xu; Bo Xu

Recently, the RNN-based acoustic model has shown promising performance. However, its generalization ability to multiple scenarios is not powerful enough for two reasons. Firstly, it encodes inter-word dependency, which conflicts with the nature that an acoustic model should model the pronunciation of words only. Secondly, the RNN-based acoustic model depicting the inner-word acoustic trajectory frame-by-frame is too precise to tolerate small distortions. In this work, we propose two variants to address aforementioned two problems. One is the word-level permutation, i.e. the order of input features and corresponding labels is shuffled with a proper probability according to word boundaries. It aims to eliminate inter-word dependencies. The other one is the improved LFR (iLFR) model, which equidistantly splits the original sentence into N utterances to overcome the discarding data in LFR model. Results based on LSTM RNN demonstrate 7% relative performance improvement by jointing the word-level permutation and iLFR.


international symposium on chinese spoken language processing | 2016

Investigating gated recurrent neural networks for acoustic modeling

Yuanyuan Zhao; Jie Li; Shuang Xu; Bo Xu

Recurrent neural networks (RNNs) with a gating mechanism have been shown to give state-of-the-art performance in acoustic modeling, such as gated recurrent unit (GRU), long short-term memory (LSTM), long short-term memory projected (L-STMP), etc. But little is known about why these gated RNNs work and what the differences are among these networks. Based on a series of experimental comparison and analysis, we find that: a) GRU usually performs better than LSTM, for possibly GRU is able to modulate the previous memory content through the learned reset gates, helping to model the long-span dependence more efficiently for speech sequence; b) LSTMP shows comparable performance with GRU, since LSTMP has the similar ability of information selection and combination by an automatic learned linear transformation in a weight-sharing way. In experiments, a visual analysis method is adopted to understand the historical information selection mechanism in RNNs in contrast to DNN. Experimental results on three different speech recognition tasks demonstrate the above conclusions and 5%–13% relative PER or CER reduction is observed.


conference of the international speech communication association | 2018

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese.

Shiyu Zhou; Linhao Dong; Shuang Xu; Bo Xu


arxiv:eess.AS | 2018

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese.

Shiyu Zhou; Linhao Dong; Shuang Xu; Bo Xu


empirical methods in natural language processing | 2017

Towards Compact and Fast Neural Machine Translation Using a Combined Method

Xiaowei Zhang; Wei Chen; Feng Wang; Shuang Xu; Bo Xu


international symposium on neural networks | 2018

Syllable-Based Acoustic Modeling with CTC for Multi-Scenarios Mandarin speech recognition

Yuanyuan Zhao; Linhao Dong; Shuang Xu; Bo Xu


international conference on computational linguistics | 2018

Semi-Supervised Disfluency Detection

Feng Wang; Zhen Yang; Wei Chen; Shuang Xu; Bo Xu; Qianqian Dong


international conference on acoustics, speech, and signal processing | 2018

Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition.

Linhao Dong; Shuang Xu; Bo Xu

Collaboration


Dive into the Shuang Xu's collaboration.

Top Co-Authors

Avatar

Bo Xu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shiyu Zhou

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yuanyuan Zhao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Feng Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Wei Chen

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiaowei Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhen Yang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jie Li

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge