Ta Li
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ta Li.
international symposium on neural networks | 2009
Ta Li; Changchun Bao; Weiqun Xu; Jielin Pan; Yonghong Yan
Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. This also helps to improve the spoken language understanding (SLU) performance. Experiment results show that our proposed method improves the performance of speech recognition by 5.7% relative CER reduction and increases the F1-measure of SLU by 1.5% absolute on our test set.
fuzzy systems and knowledge discovery | 2012
Yuhong Guo; Ta Li; Yujing Si; Jielin Pan; Yonghong Yan
Speech recognition decoder is an important part of large vocabulary speech recognition application. The speed and the accuracy is the main concern of its application. Recently, weighted finite state transducers (WFST) has become the dominant description of decoding network. However, the large memory and time cost of constructing the final WFST decoding network is the bottleneck of this technique. The goal of this article is to construct a tight, flexible WFST decoding network as well as a fast, scalable decoder. A tight representation of silence in speech is proposed and the decoding algorithm with improved pruning strategies is also suggested. The experimental results show that the proposed network presentation will cut off 37% memory cost and 19% time cost of constructing the final decoding network. And with the decoding strategies of WFST feature specified beams the proposed decoders efficiency and accuracy are also significantly improved.
international conference on natural computation | 2014
Xuyang Wang; Ta Li; Yeming Xiao; Jielin Pan; Yonghong Yan
In this paper, we propose to use Deep Neural Network (DNN), which has been proved to be the state-of-the-art technique in speech recognition, to re-estimate the confidence of keyword hypotheses in the verification stage of spoken term detection. The speech recognition system based on DNN outperforms that based on conventional Gaussian Mixture Model (GMM) but suffers from the increased decoding time. When the speed of decoding or indexing is critical, it seems to be a trade-off between the performance and the speed to utilize DNN in keyword verification. Inspired by the utilization and acceleration of DNN in the decoding stage, we explored an efficient method to replace GMM by DNN in the verification stage. 5% relative reduction of equal error rate (EER) is achieved and the improvement of recall in the high precision region is especially significant, which is essential to practical tasks. Meanwhile, the search time decreases more than 50% compared to the time derived from the verification on DNN without any refinements.
Applied Mechanics and Materials | 2014
An Hao Xing; Ta Li; Jie Lin Pan; Yong Hong Yan
The wake-up word speech recognition system is a new paradigm in the field of automatic speech recognition (ASR). This new paradigm is not yet widely recognized but useful in many applications such as mobile phones and smart home systems. In this paper we describe the development of a compact wake-up word recognizer for embedded platforms. To keep resource cost low, a variety of simplification techniques are used. Speech feature observations are compressed to lower dimension and the simple distance-based template matching method is used in place of complex Viterbi scoring. We apply double scoring method to achieve a better performance. To cooperate with double scoring method, the support vector machine classifier is used as well. We were able to accomplish a performance improvement with false rejection rate reduced from 6.88% to 5.50% and false acceptance rate reduced from 8.40% to 3.01%.
international conference on natural computation | 2012
Yujing Si; Ta Li; Shang Cai; Jielin Pan; Yonghong Yan
Over more than three decades, the development of automatic speech recognition (ASR) technology has made it possible for some intelligent query systems to use a voice interface. Specially, voice input system is a practical and interesting application of ASR. In this paper, we present our recent work on using Recurrent Neural Network Language Model (RNNLM) to improve the performance of our Mandarin voice input system. The Mandarin voice input system employs a two-pass strategy. In the first pass, a memory-efficient state network and a tri-gram language model are used to generate the word lattice from which the n-best list is extracted. And, in the second pass, we use a large 4-gram language model and RNNLM to re-rank the n-best list and then output the new best hypothesis. Experiments showed that it was very effective for RNNLM to be used in the n-best list re-score. Eventually, 10.2% relative reduction in word error rate (from 13.7% to 12.3%) was achieved on a voice search task, compared to the result of the first pass.
international conference on natural computation | 2011
Shang Cai; Zhen Zhang; Ta Li; Jielin Pan; Yonghong Yan
The development of automatic speech recognition (ASR) technology in recent years has made it possible for some intelligent query systems to use a voice interface. Automatic song selection is a practical and interesting application of ASR. In this paper we describe our efforts to build and improve a Chinese song name recognition system. It is a large vocabulary, speaker-independent system currently in commercial use. This is a typical example of large list recognition tasks. We use a new paradigm for lager list recognition. In this framework, the spoken query is recognized by a LVCSR module firstly and then the recognized result is used to search for the final song name. Unlike transcription tasks, such as Switchboard task, our LVCSR module is an in-domain application. We present some innovative optimizations that improve the song name recognition accuracy. The experimental result shows that there techniques make relative 7.36%, 26.87% and 32.71% error rate reduction for top 1, top 5 and top 10 results respectively upon the conventional grammar-constrained recognizer.
international conference on education technology and computer | 2010
Ta Li; Shang Cai; Jielin Pan; Yonghong Yan
Large list recognition is usually considered as an automatic speech recognition (ASR) problem. In this paper we propose a method to deal with this problem under the voice search framework. In this framework, the spoken utterance is first converted to text information by a speech recognizer and then the text information is used to search for the final list. We develop an efficient front-end recognizer and propose a back-end three pass post fuzzy retrieval method. The experiment results on the Chinese song name recognition task show that our system gains relative 4.94%, 26.12%, 32.48% error rate reduction for top 1, top 5 and top 10 results respectively upon the conventional grammar constrained recognizer.
fuzzy systems and knowledge discovery | 2009
Ta Li; Weiqun Xu; Jielin Pan; Yonghong Yan
Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. Experiment results show that our proposed method improves the performance of speech recognition by 5.7\% relative character error rate (CER) reduction.
international conference on natural computation | 2014
Anhao Xing; Xin Jin; Ta Li; Xuyang Wang; Jielin Pan; Yonghong Yan
A new acoustic model based on deep neural network (DNN) has been introduced recently and outperforms the conventional Gaussian mixture model (GMM) in speech recognition on several tasks. However, the number of parameters required by a DNN model is much larger than that of its counterpart. The excessive cost of computation cumbers the implementation of DNN in many scenarios. In this paper, a DNN-based speech recognizer is implemented on an embedded platform. To reduce model size and computation cost, the DNN model is converted from float-point to fixed-point and NEON instructions are applied. The speed is further improved as a result of the downsizing of the DNN model via singular value decomposition (SVD) reconstruction. The work is evaluated on an ARM Cortex-A7 platform and 12x reduction of model size and 15x speedup of calculation are achieved without any noticeable accuracy loss.
Journal of the Acoustical Society of America | 2012
Yuhong Guo; Ta Li; Yujing Si; Jielin Pan; Yonghong Yan
Voice search system can provide users with information according to their spoken queries. However, as the most important module in this system, the high word error rate of the automatic speech recognition (ASR) part degrades the whole systems performance. Moreover, the runtime efficiency of the ASR also becomes the bottleneck in the large scale application of voice search. In this paper, an optimized weighted finite-state transducer (WFST) based voice search system is proposed. A weighed parallel silence short-pause model is introduced to reduce both the final transducer size and the word error rate. The WFST network is optimized as well. The experimental results show that, the recognition speed of proposed system outperforms the other recognition system at the equal word error rate and the miracle error rate is also significantly reduced. This work is partially supported by the National Natural Science Foundation of China (Nos. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319).