Is this you? Create Your Porfile

Ta Li

Chinese Academy of Sciences

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ta Li is active.

Explore More

Publication

Featured researches published by Ta Li.

international symposium on neural networks | 2009

Improving Voice Search Using Forward-Backward LVCSR System Combination

Ta Li; Changchun Bao; Weiqun Xu; Jielin Pan; Yonghong Yan

Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. This also helps to improve the spoken language understanding (SLU) performance. Experiment results show that our proposed method improves the performance of speech recognition by 5.7% relative CER reduction and increases the F1-measure of SLU by 1.5% absolute on our test set.

fuzzy systems and knowledge discovery | 2012

Optimized large vocabulary WFST speech recognition system

Yuhong Guo; Ta Li; Yujing Si; Jielin Pan; Yonghong Yan

Speech recognition decoder is an important part of large vocabulary speech recognition application. The speed and the accuracy is the main concern of its application. Recently, weighted finite state transducers (WFST) has become the dominant description of decoding network. However, the large memory and time cost of constructing the final WFST decoding network is the bottleneck of this technique. The goal of this article is to construct a tight, flexible WFST decoding network as well as a fast, scalable decoder. A tight representation of silence in speech is proposed and the decoding algorithm with improved pruning strategies is also suggested. The experimental results show that the proposed network presentation will cut off 37% memory cost and 19% time cost of constructing the final decoding network. And with the decoding strategies of WFST feature specified beams the proposed decoders efficiency and accuracy are also significantly improved.

international conference on natural computation | 2014

Improved mandarin spoken term detection by using deep neural network for keyword verification

Xuyang Wang; Ta Li; Yeming Xiao; Jielin Pan; Yonghong Yan

In this paper, we propose to use Deep Neural Network (DNN), which has been proved to be the state-of-the-art technique in speech recognition, to re-estimate the confidence of keyword hypotheses in the verification stage of spoken term detection. The speech recognition system based on DNN outperforms that based on conventional Gaussian Mixture Model (GMM) but suffers from the increased decoding time. When the speed of decoding or indexing is critical, it seems to be a trade-off between the performance and the speed to utilize DNN in keyword verification. Inspired by the utilization and acceleration of DNN in the decoding stage, we explored an efficient method to replace GMM by DNN in the verification stage. 5% relative reduction of equal error rate (EER) is achieved and the improvement of recall in the high precision region is especially significant, which is essential to practical tasks. Meanwhile, the search time decreases more than 50% compared to the time derived from the verification on DNN without any refinements.

Applied Mechanics and Materials | 2014

Compact Wake-Up Word Speech Recognition on Embedded Platforms

An Hao Xing; Ta Li; Jie Lin Pan; Yong Hong Yan

The wake-up word speech recognition system is a new paradigm in the field of automatic speech recognition (ASR). This new paradigm is not yet widely recognized but useful in many applications such as mobile phones and smart home systems. In this paper we describe the development of a compact wake-up word recognizer for embedded platforms. To keep resource cost low, a variety of simplification techniques are used. Speech feature observations are compressed to lower dimension and the simple distance-based template matching method is used in place of complex Viterbi scoring. We apply double scoring method to achieve a better performance. To cooperate with double scoring method, the support vector machine classifier is used as well. We were able to accomplish a performance improvement with false rejection rate reduced from 6.88% to 5.50% and false acceptance rate reduced from 8.40% to 3.01%.

international conference on natural computation | 2012

Recurrent neural network language model in mandarin voice input system

Yujing Si; Ta Li; Shang Cai; Jielin Pan; Yonghong Yan

Over more than three decades, the development of automatic speech recognition (ASR) technology has made it possible for some intelligent query systems to use a voice interface. Specially, voice input system is a practical and interesting application of ASR. In this paper, we present our recent work on using Recurrent Neural Network Language Model (RNNLM) to improve the performance of our Mandarin voice input system. The Mandarin voice input system employs a two-pass strategy. In the first pass, a memory-efficient state network and a tri-gram language model are used to generate the word lattice from which the n-best list is extracted. And, in the second pass, we use a large 4-gram language model and RNNLM to re-rank the n-best list and then output the new best hypothesis. Experiments showed that it was very effective for RNNLM to be used in the n-best list re-score. Eventually, 10.2% relative reduction in word error rate (from 13.7% to 12.3%) was achieved on a voice search task, compared to the result of the first pass.

international conference on natural computation | 2011

Development of a Chinese song name recognition system

Shang Cai; Zhen Zhang; Ta Li; Jielin Pan; Yonghong Yan

The development of automatic speech recognition (ASR) technology in recent years has made it possible for some intelligent query systems to use a voice interface. Automatic song selection is a practical and interesting application of ASR. In this paper we describe our efforts to build and improve a Chinese song name recognition system. It is a large vocabulary, speaker-independent system currently in commercial use. This is a typical example of large list recognition tasks. We use a new paradigm for lager list recognition. In this framework, the spoken query is recognized by a LVCSR module firstly and then the recognized result is used to search for the final song name. Unlike transcription tasks, such as Switchboard task, our LVCSR module is an in-domain application. We present some innovative optimizations that improve the song name recognition accuracy. The experimental result shows that there techniques make relative 7.36%, 26.87% and 32.71% error rate reduction for top 1, top 5 and top 10 results respectively upon the conventional grammar-constrained recognizer.

international conference on education technology and computer | 2010

Large list recognition using voice search framework

Ta Li; Shang Cai; Jielin Pan; Yonghong Yan

Large list recognition is usually considered as an automatic speech recognition (ASR) problem. In this paper we propose a method to deal with this problem under the voice search framework. In this framework, the spoken utterance is first converted to text information by a speech recognizer and then the text information is used to search for the final list. We develop an efficient front-end recognizer and propose a back-end three pass post fuzzy retrieval method. The experiment results on the Chinese song name recognition task show that our system gains relative 4.94%, 26.12%, 32.48% error rate reduction for top 1, top 5 and top 10 results respectively upon the conventional grammar constrained recognizer.

fuzzy systems and knowledge discovery | 2009

Improving Automatic Speech Recognizer of Voice Search Using System Combination

Ta Li; Weiqun Xu; Jielin Pan; Yonghong Yan

Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. Experiment results show that our proposed method improves the performance of speech recognition by 5.7\% relative character error rate (CER) reduction.

international conference on natural computation | 2014

Speeding up deep neural networks for speech recognition on ARM Cortex-A series processors

Anhao Xing; Xin Jin; Ta Li; Xuyang Wang; Jielin Pan; Yonghong Yan

A new acoustic model based on deep neural network (DNN) has been introduced recently and outperforms the conventional Gaussian mixture model (GMM) in speech recognition on several tasks. However, the number of parameters required by a DNN model is much larger than that of its counterpart. The excessive cost of computation cumbers the implementation of DNN in many scenarios. In this paper, a DNN-based speech recognizer is implemented on an embedded platform. To reduce model size and computation cost, the DNN model is converted from float-point to fixed-point and NEON instructions are applied. The speed is further improved as a result of the downsizing of the DNN model via singular value decomposition (SVD) reconstruction. The work is evaluated on an ARM Cortex-A7 platform and 12x reduction of model size and 15x speedup of calculation are achieved without any noticeable accuracy loss.

Journal of the Acoustical Society of America | 2012

Voice search optimization using weighted finite-state transducers

Yuhong Guo; Ta Li; Yujing Si; Jielin Pan; Yonghong Yan

Voice search system can provide users with information according to their spoken queries. However, as the most important module in this system, the high word error rate of the automatic speech recognition (ASR) part degrades the whole systems performance. Moreover, the runtime efficiency of the ASR also becomes the bottleneck in the large scale application of voice search. In this paper, an optimized weighted finite-state transducer (WFST) based voice search system is proposed. A weighed parallel silence short-pause model is introduced to reduce both the final transducer size and the word error rate. The WFST network is optimized as well. The experimental results show that, the recognition speed of proposed system outperforms the other recognition system at the equal word error rate and the miracle error rate is also significantly reduced. This work is partially supported by the National Natural Science Foundation of China (Nos. 10925419, 90920302, 10874203, 60875014, 61072124, 11074275, 11161140319).

Explore More