Jidong Tao
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jidong Tao.
international conference on acoustics, speech, and signal processing | 2016
Jidong Tao; Shabnam Ghaffarzadegan; Lei Chen; Klaus Zechner
We investigate two deep learning architectures reported to have superior performance in ASR over the conventional GMM system, with respect to automatic speech scoring. We use an approximately 800-hour large-vocabulary non-native spontaneous English corpus to build three ASR systems. One system is in GMM, and two are in deep learning architectures - namely, DNN and Tandem with bottleneck features. The evaluation results show that the both deep learning systems significantly outperform the GMM ASR. These ASR systems are used as the front-end in building an automated speech scoring system. To examine the effectiveness of the deep learning ASR systems for automated scoring, another non-native spontaneous speech corpus is used to train and evaluate the scoring models. Using deep learning architectures, ASR accuracies drop significantly on the scoring corpus, whereas the performance of the scoring systems get closer to human raters, and consistently better than the GMM one. Compared to the DNN ASR, the Tandem performs slightly better on the scoring speech while it is a little less accurate on the ASR evaluation dataset. Furthermore, given the results of the improved scoring performance while using fewer scoring features, the Tandem system shows more robustness for scoring task than the DNN one.
ieee automatic speech recognition and understanding workshop | 2015
Zhou Yu; Vikram Ramanarayanan; David Suendermann-Oeft; Xinhao Wang; Klaus Zechner; Lei Chen; Jidong Tao; Aliaksei Ivanou; Yao Qian
We introduce a new method to grade non-native spoken language tests automatically. Traditional automated response grading approaches use manually engineered time-aggregated features (such as mean length of pauses). We propose to incorporate general time-sequence features (such as pitch) which preserve more information than time-aggregated features and do not require human effort to design. We use a type of recurrent neural network to jointly optimize the learning of high level abstractions from time-sequence features with the time-aggregated features. We first automatically learn high level abstractions from time-sequence features with a Bidirectional Long Short Term Memory (BLSTM) and then combine the high level abstractions with time-aggregated features in a Multilayer Perceptron (MLP)/Linear Regression (LR). We optimize the BLSTM and the MLP/LR jointly. We find such models reach the best performance in terms of correlation with human raters. We also find that when there are limited time-aggregated features available, our model that incorporates time-sequence features improves performance drastically.
annual meeting of the special interest group on discourse and dialogue | 2015
Alexei V. Ivanov; Vikram Ramanarayanan; David Suendermann-Oeft; Melissa Lopez; Keelan Evanini; Jidong Tao
Dialogue interaction with remote interlocutors is a difficult application area for speech recognition technology because of the limited duration of acoustic context available for adaptation, the narrow-band and compressed signal encoding used in telecommunications, high variability of spontaneous speech and the processing time constraints. It is even more difficult in the case of interacting with non-native speakers because of the broader allophonic variation, less canonical prosodic patterns, a higher rate of false starts and incomplete words, unusual word choice and smaller probability to have a grammatically well formed sentence. We present a comparative study of various approaches to speech recognition in non-native context. Comparing systems in terms of their accuracy and real-time factor we find that a Kaldi-based Deep Neural Network Acoustic Model (DNN-AM) system with online speaker adaptation by far outperforms other available methods.
conference of the international speech communication association | 2016
Yao Qian; Jidong Tao; David Suendermann-Oeft; Keelan Evanini; Alexei V. Ivanov; Vikram Ramanarayanan
Recently, text independent speaker recognition systems with phonetically-aware DNNs, which allow the comparison among different speakers with “soft-aligned” phonetic content, have significantly outperformed standard i-vector based systems [912]. However, when applied to speaker recognition on a nonnative spontaneous corpus, DNN-based speaker recognition does not show its superior performance due to the relatively lower accuracy of phonetic content recognition. In this paper, noise-aware features and multi-task learning are investigated to improve the alignment of speech feature frames into the subphonemic “senone” space and to “distill” the L1 (native language) information of the test takers into bottleneck features (BNFs), which we refer to as metadata sensitive BNFs. Experimental results show that the system with metadata sensitive BNFs can improve speaker recognition performance by a 23.9% relative reduction in equal error rate (EER) compared to the baseline i-vector system. In addition, L1 info is just used to train the BNFs extractor, so it is not necessary to be used as input for BNFs extraction, i-vector extraction and scoring for the enrollment and evaluation sets, which can avoid the use of erroneous L1s claimed by imposters.
WOCCI | 2014
Keelan Evanini; Youngsoon So; Jidong Tao; Diego Zapata-Rivera; Christine Luce; Laura Battistini; Xinhao Wang
international conference on acoustics, speech, and signal processing | 2018
Lei Chen; Jidong Tao; Shabnam Ghaffarzadegan; Yao Qian
Archive | 2018
Yao Qian; Jidong Tao; David Suendermann-oeft; Keelan Evanini; Alexei V. Ivanov; Vikram Ramanarayanan
ETS Research Report Series | 2018
Lei Chen; Klaus Zechner; Su-Youn Yoon; Keelan Evanini; Xinhao Wang; Anastassia Loukina; Jidong Tao; Lawrence Davis; Chong Min Lee; Min Ma; Robert Mundkowsky; Chi Lu; Chee Wee Leong; Binod Gyawali
Archive | 2017
Jidong Tao; Lei Chen; Chong Min Lee
ETS Research Report Series | 2017
Anastassia Loukina; Klaus Zechner; Su-Youn Yoon; Mo Zhang; Jidong Tao; Xinhao Wang; Chong Min Lee; Matthew Mulholland