Jidong Tao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jidong Tao is active.

Explore More

Publication

Featured researches published by Jidong Tao.

international conference on acoustics, speech, and signal processing | 2016

Exploring deep learning architectures for automatically grading non-native spontaneous speech

Jidong Tao; Shabnam Ghaffarzadegan; Lei Chen; Klaus Zechner

We investigate two deep learning architectures reported to have superior performance in ASR over the conventional GMM system, with respect to automatic speech scoring. We use an approximately 800-hour large-vocabulary non-native spontaneous English corpus to build three ASR systems. One system is in GMM, and two are in deep learning architectures - namely, DNN and Tandem with bottleneck features. The evaluation results show that the both deep learning systems significantly outperform the GMM ASR. These ASR systems are used as the front-end in building an automated speech scoring system. To examine the effectiveness of the deep learning ASR systems for automated scoring, another non-native spontaneous speech corpus is used to train and evaluate the scoring models. Using deep learning architectures, ASR accuracies drop significantly on the scoring corpus, whereas the performance of the scoring systems get closer to human raters, and consistently better than the GMM one. Compared to the DNN ASR, the Tandem performs slightly better on the scoring speech while it is a little less accurate on the ASR evaluation dataset. Furthermore, given the results of the improved scoring performance while using fewer scoring features, the Tandem system shows more robustness for scoring task than the DNN one.

ieee automatic speech recognition and understanding workshop | 2015

Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech

Zhou Yu; Vikram Ramanarayanan; David Suendermann-Oeft; Xinhao Wang; Klaus Zechner; Lei Chen; Jidong Tao; Aliaksei Ivanou; Yao Qian

We introduce a new method to grade non-native spoken language tests automatically. Traditional automated response grading approaches use manually engineered time-aggregated features (such as mean length of pauses). We propose to incorporate general time-sequence features (such as pitch) which preserve more information than time-aggregated features and do not require human effort to design. We use a type of recurrent neural network to jointly optimize the learning of high level abstractions from time-sequence features with the time-aggregated features. We first automatically learn high level abstractions from time-sequence features with a Bidirectional Long Short Term Memory (BLSTM) and then combine the high level abstractions with time-aggregated features in a Multilayer Perceptron (MLP)/Linear Regression (LR). We optimize the BLSTM and the MLP/LR jointly. We find such models reach the best performance in terms of correlation with human raters. We also find that when there are limited time-aggregated features available, our model that incorporates time-sequence features improves performance drastically.

annual meeting of the special interest group on discourse and dialogue | 2015

Automated Speech Recognition Technology for Dialogue Interaction with Non-Native Interlocutors

Alexei V. Ivanov; Vikram Ramanarayanan; David Suendermann-Oeft; Melissa Lopez; Keelan Evanini; Jidong Tao

Dialogue interaction with remote interlocutors is a difficult application area for speech recognition technology because of the limited duration of acoustic context available for adaptation, the narrow-band and compressed signal encoding used in telecommunications, high variability of spontaneous speech and the processing time constraints. It is even more difficult in the case of interacting with non-native speakers because of the broader allophonic variation, less canonical prosodic patterns, a higher rate of false starts and incomplete words, unusual word choice and smaller probability to have a grammatically well formed sentence. We present a comparative study of various approaches to speech recognition in non-native context. Comparing systems in terms of their accuracy and real-time factor we find that a Kaldi-based Deep Neural Network Acoustic Model (DNN-AM) system with online speaker adaptation by far outperforms other available methods.

conference of the international speech communication association | 2016

Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech Input.

Yao Qian; Jidong Tao; David Suendermann-Oeft; Keelan Evanini; Alexei V. Ivanov; Vikram Ramanarayanan

Recently, text independent speaker recognition systems with phonetically-aware DNNs, which allow the comparison among different speakers with “soft-aligned” phonetic content, have significantly outperformed standard i-vector based systems [912]. However, when applied to speaker recognition on a nonnative spontaneous corpus, DNN-based speaker recognition does not show its superior performance due to the relatively lower accuracy of phonetic content recognition. In this paper, noise-aware features and multi-task learning are investigated to improve the alignment of speech feature frames into the subphonemic “senone” space and to “distill” the L1 (native language) information of the test takers into bottleneck features (BNFs), which we refer to as metadata sensitive BNFs. Experimental results show that the system with metadata sensitive BNFs can improve speaker recognition performance by a 23.9% relative reduction in equal error rate (EER) compared to the baseline i-vector system. In addition, L1 info is just used to train the BNFs extractor, so it is not necessary to be used as input for BNFs extraction, i-vector extraction and scoring for the enrollment and evaluation sets, which can avoid the use of erroneous L1s claimed by imposters.

WOCCI | 2014

Performance of a trialogue-based prototype system for English language assessment for young learners.

Keelan Evanini; Youngsoon So; Jidong Tao; Diego Zapata-Rivera; Christine Luce; Laura Battistini; Xinhao Wang

international conference on acoustics, speech, and signal processing | 2018

End-to-End Neural Network Based Automated Speech Scoring.

Lei Chen; Jidong Tao; Shabnam Ghaffarzadegan; Yao Qian

Archive | 2018

Computer-implemented systems and methods for speaker recognition using a neural network

Yao Qian; Jidong Tao; David Suendermann-oeft; Keelan Evanini; Alexei V. Ivanov; Vikram Ramanarayanan

ETS Research Report Series | 2018

Automated Scoring of Nonnative Speech Using the SpeechRater SM v. 5.0 Engine: SpeechRater SM v. 5.0 Engine

Lei Chen; Klaus Zechner; Su-Youn Yoon; Keelan Evanini; Xinhao Wang; Anastassia Loukina; Jidong Tao; Lawrence Davis; Chong Min Lee; Min Ma; Robert Mundkowsky; Chi Lu; Chee Wee Leong; Binod Gyawali

Archive | 2017