Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhiyuan Tang is active.

Publication


Featured researches published by Zhiyuan Tang.


international conference on acoustics, speech, and signal processing | 2016

Recurrent neural network training with dark knowledge transfer

Zhiyuan Tang; Dong Wang; Zhiyong Zhang

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), have gained much attention in automatic speech recognition (ASR). Although some successful stories have been reported, training RNNs remains highly challenging, especially with limited training data. Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision. This knowledge transfer learning has been employed to train simple neural nets with a complex one, so that the final performance can reach a level that is infeasible to obtain by regular training. In this paper, we employ the knowledge transfer learning approach to train RNNs (precisely LSTM) using a deep neural network (DNN) model as the teacher. This is different from most of the existing research on knowledge transfer learning, since the teacher (DNN) is assumed to be weaker than the child (RNN); however, our experiments on an ASR task showed that it works fairly well: without applying any tricks on the learning scheme, this approach can train RNNs successfully even with limited training data.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition

Zhiyuan Tang; Lantian Li; Dong Wang; Ravichander Vipperla

Automatic speech and speaker recognition are traditionally treated as two independent tasks and are studied separately. The human brain in contrast deciphers the linguistic content, and the speaker traits from the speech in a collaborative manner. This key observation motivates the work presented in this paper. A collaborative joint training approach based on multitask recurrent neural network models is proposed, where the output of one task is backpropagated to the other tasks. This is a general framework for learning collaborative tasks and fits well with the goal of joint learning of automatic speech and speaker recognition. Through a comprehensive study, it is shown that the multitask recurrent neural net models deliver improved performance on both automatic speech and speaker recognition tasks as compared to single-task systems. The strength of such multitask collaborative learning is analyzed, and the impact of various training configurations is investigated.


asia pacific signal and information processing association annual summit and conference | 2016

Multi-task recurrent model for speech and speaker recognition

Zhiyuan Tang; Lantian Li; Dong Wang

Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether. The model is based on a unified neural network where the output of one task is fed to the input of the other, leading to a multi-task recurrent network. Experiments show that the joint model outperforms the task-specific models on both the two tasks.


international conference on acoustics, speech, and signal processing | 2017

Memory visualization for gated recurrent neural networks in speech recognition

Zhiyuan Tang; Ying Shi; Dong Wang; Yang Feng; Shiyue Zhang

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties behind the remarkable performance remain unclear in many applications, e.g., automatic speech recognition (ASR). This paper employs visualization techniques to study the behavior of LSTM and GRU when performing speech recognition tasks. Our experiments show some interesting patterns in the gated memory, and some of them have inspired simple yet effective modifications on the network structure. We report two of such modifications: (1) lazy cell update in LSTM, and (2) shortcut connections for residual learning. Both modifications lead to more comprehensible and powerful networks.


IEEE Transactions on Audio, Speech, and Language Processing | 2018

Phonetic Temporal Neural Model for Language Identification

Zhiyuan Tang; Dong Wang; Yixiang Chen; Lantian Li; Andrew Abel

Deep neural models, particularly the long short-term memory recurrent neural network (LSTM-RNN) model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: It is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.


arXiv: Sound | 2017

Collaborative Learning for Language and Speaker Recognition

Lantian Li; Zhiyuan Tang; Dong Wang; Andrew Abel; Yang Feng; Shiyue Zhang

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demonstrated that the multi-task model outperforms the task-specific models on both tasks.


asia pacific signal and information processing association annual summit and conference | 2016

Multi-task recurrent model for true multilingual speech recognition

Zhiyuan Tang; Lantian Li; Dong Wang

Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. However, this approach is only useful when the deployed system supports just one language. In a true multilingual scenario where multiple languages are allowed, performance will be significantly reduced due to the competition among languages in the decoding space. This paper presents a multi-task recurrent model that involves a multilingual speech recognition (ASR) component and a language recognition (LR) component, and the ASR component is informed of the language information by the LR component, leading to a language-aware recognition. We tested the approach on an English-Chinese bilingual recognition task. The results show that the proposed multi-task recurrent model can improve performance of multilingual recognition systems.


arXiv: Learning | 2015

Knowledge Transfer Pre-training.

Zhiyuan Tang; Dong Wang; Yiqiao Pan; Zhiyong Zhang


asia pacific signal and information processing association annual summit and conference | 2017

AP17-OLR challenge: Data, plan, and baseline

Zhiyuan Tang; Dong Wang; Yixiang Chen; Qing Chen


international conference on acoustics, speech, and signal processing | 2018

DEEP FACTORIZATION FOR SPEECH SIGNAL

Lantian Li; Dong Wang; Yixiang Chen; Ying Shi; Zhiyuan Tang; Thomas Fang Zheng

Collaboration


Dive into the Zhiyuan Tang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrew Abel

Xi'an Jiaotong-Liverpool University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge