Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jielin Pan is active.

Publication


Featured researches published by Jielin Pan.


international conference on acoustics, speech, and signal processing | 2008

Mandarin-English bilingual Speech Recognition for real world music retrieval

Qingqing Zhang; Jielin Pan; Yonghong Yan

This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual speech recognition system (MESRS) for real world music retrieval. In order to balance the performance and the complexity of the bilingual SR system, an unified single set of bilingual acoustic models derived by phone clustering is developed. A novel two-pass phone clustering method based on confusion matrix (TCM) is presented and compared with the log-likelihood measure method. In order to deal with the Mandarin accent in spoken English, different non-native adaptation approaches are investigated. With the effective incorporation of approaches on phone clustering and non-native adaptation, the phrase error rate (PhrER) of MESRS for English utterances was reduced by 24.5% relatively compared to the baseline monolingual English system while the PhrER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system, and the performance for bilingual code-mixing utterances achieved 22.4% relative PhrER reduction.


international conference on acoustics, speech, and signal processing | 2010

Improved modeling for F0 generation and V/U decision in HMM-based TTS

Qingqing Zhang; Frank K. Soong; Yao Qian; Zhi-Jie Yan; Jielin Pan; Yonghong Yan

The HMM-based TTS can produce a highly intelligible and decent quality voice. However, sometimes the synthesized speech exhibits perceptibly annoying glitches due to F0 extraction errors in the training data and voiced/unvoiced swapping errors in F0 generation. In the conventional MSD based F0 modeling [10], the dual but incompatible two probabilistic spaces, the continuous probability density for voiced observations or the discrete probability for unvoiced observations, prevent us from using likelihood based frame occupancy to alleviate the deteriorating effect of F0 extraction errors in training a more robust model for synthesis. In this paper, we propose a new approach to improved modeling the piece-wise continuous F0 trajectory and v/u decision for HMM-based TTS. Voicing strength, characterized by the normalized correlation coefficient magnitude calculated in F0 feature extraction, is used as an additional feature in F0 modeling and for v/u decision. Experimental results show the new approach to F0 modeling and generation outperforms MSD-HMM method and a newly proposed GTD-HMM method [9] significantly. The improvements are both objectively measurable and subjectively perceivable.


fuzzy systems and knowledge discovery | 2009

Investigations to Minimum Phone Error Training in Bilingual Speech Recognition

Ran Xu; Qingqing Zhang; Jielin Pan; Yonghong Yan

The great success of Minimum Phone Error (MPE) training criterion in mono-language large vocabulary continuous speech recognition (LVCSR) tasks motivates us to apply it to bilingual LVCSR systems. In this paper, in conjunction with the previous respectable bilingual phoneme inventory construction techniques, we give a comprehensive investigation to the performance of MPE/fMPE on various Mandarin-English bilingual test sets under different test conditions. The evaluation results show that the final fMPE+MPE model achieves significant improvements compared to the baseline models. On the mono-language test sets, the best improvement is a relative error rate reduction of 28.4%. And on the code-mixing test set, it also achieves a relative error rate reduction of 8.1%. The within- and cross-language substitution error rate introduced in this paper also explicitly shows that fMPE/MPE training can effectively improve the models within- and cross-language discriminabilities in our bilingual recognition tasks.


international symposium on neural networks | 2009

Improving Voice Search Using Forward-Backward LVCSR System Combination

Ta Li; Changchun Bao; Weiqun Xu; Jielin Pan; Yonghong Yan

Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. This also helps to improve the spoken language understanding (SLU) performance. Experiment results show that our proposed method improves the performance of speech recognition by 5.7% relative CER reduction and increases the F1-measure of SLU by 1.5% absolute on our test set.


fuzzy systems and knowledge discovery | 2012

Optimized large vocabulary WFST speech recognition system

Yuhong Guo; Ta Li; Yujing Si; Jielin Pan; Yonghong Yan

Speech recognition decoder is an important part of large vocabulary speech recognition application. The speed and the accuracy is the main concern of its application. Recently, weighted finite state transducers (WFST) has become the dominant description of decoding network. However, the large memory and time cost of constructing the final WFST decoding network is the bottleneck of this technique. The goal of this article is to construct a tight, flexible WFST decoding network as well as a fast, scalable decoder. A tight representation of silence in speech is proposed and the decoding algorithm with improved pruning strategies is also suggested. The experimental results show that the proposed network presentation will cut off 37% memory cost and 19% time cost of constructing the final decoding network. And with the decoding strategies of WFST feature specified beams the proposed decoders efficiency and accuracy are also significantly improved.


biomedical engineering and informatics | 2012

Parallel implementation of neural networks training on graphic processing unit

Yong Liu; Yeming Xiao; Li Wang; Jielin Pan; Yonghong Yan

Recently artificial neural network (ANN) especially the deep belief network (DBN) becomes more and more popular in the acoustic model training. In order to improve the speed of ANN, the Graphics Processing Unit (GPU) is used. This paper gives the training details of the Back-Propagation (BP) neural network acoustic model for speech recognition on GPU, including the parallel reduction application and asynchronous implementation between CPU and GPU. It is 26 times faster than using the single thread Intel® MKL(Math Kernel Library) implementation.


international conference on natural computation | 2008

Using Discriminative Training Techniques in Practical Intelligent Music Retrieval System

Ran Xu; Jielin Pan; Yonghong Yan

The development of speech recognition technology has made it possible for some intelligent query systems to use a voice interface. In this paper, we developed a pop-song music retrieval system for telecom carriers to facilitate the interactions between the end users and the music database. When trying to improve the system performance, however, it was found that some typical recognizing optimization techniques for large vocabulary continuous speech recognition (LVCSR) is not practicable for such a real-time application, in which accuracy and speed are both highly stressed. Thus, model optimization techniques are considered. Feature discriminative analysis and minimum phone error discriminative training techniques proposed in recent years have obtained great success in LVCSR, however, there are few reports about their practical applications on online grammar-constrained recognition tasks. In this paper, these techniques are employed and evaluated on such a real-time recognition task. The experimental result shows that these techniques can be effectively implemented in our practical application system with a remarkable error rate reduction of 13.3%.


international conference on signal and information processing | 2015

Improving HMM/DNN in ASR of under-resourced languages using probabilistic sampling

Meixu Song; Qingqing Zhang; Jielin Pan; Yonghong Yan

In HMM/DNN automatic speech recognition (ASR) systems, the DNNs model the posterior probabilities for triphone states. However, triphone states are unevenly distributed. In this situation, the training algorithm tends to converge to a local optimum more related to states with rich data than states with poor data. Thus, the imbalance of the training data decreases the ASR performances, especially for under-resourced languages. To deal with this issue, we explore a resampling technique, called “probabilistic sampling”, which can be seen as a linear smoothing between the original sampling and the uniform sampling. The effectiveness of the probabilistic sampling has been studied in two under-resourced ASR experiments. With the probabilistic sampling, the first experiment got a 6.3% relative phone error rate (PER) reduction compared to the conventional DNN baseline; the second experiment used shared-hidden-layer multilingual DNN as the baseline, and obtained a 4.9% relative PER reduction.


international conference on natural computation | 2014

Improved mandarin spoken term detection by using deep neural network for keyword verification

Xuyang Wang; Ta Li; Yeming Xiao; Jielin Pan; Yonghong Yan

In this paper, we propose to use Deep Neural Network (DNN), which has been proved to be the state-of-the-art technique in speech recognition, to re-estimate the confidence of keyword hypotheses in the verification stage of spoken term detection. The speech recognition system based on DNN outperforms that based on conventional Gaussian Mixture Model (GMM) but suffers from the increased decoding time. When the speed of decoding or indexing is critical, it seems to be a trade-off between the performance and the speed to utilize DNN in keyword verification. Inspired by the utilization and acceleration of DNN in the decoding stage, we explored an efficient method to replace GMM by DNN in the verification stage. 5% relative reduction of equal error rate (EER) is achieved and the improvement of recall in the high precision region is especially significant, which is essential to practical tasks. Meanwhile, the search time decreases more than 50% compared to the time derived from the verification on DNN without any refinements.


global congress on intelligent systems | 2013

Improving Korean LVCSR with Long-Time Temporal Patterns and an Extended Phoneme Set

Ji Xu; Zhen Zhang; Qingqing Zhang; Jielin Pan; Yonghong Yan

Korean is an agglutinative language, in which pronunciations are affected by long-term context. In this paper, the long-time temporal information is investigated to improve Korean LVCSR. TRAP-based MLP features, which are able to utilize the scattered acoustic information over several hundred milliseconds, are employed to obtain additional information besides the conventional cepstral features. In contrast to the traditional Korean phoneme set, in which consonants in the initial and final positions are taken as the same, a more specific phoneme set is constructed via taking consonants as position dependent. In the Korean broadcast news speech recognition task, experiments show that with these improvements the character error rate has been reduced by 25.3% relatively over the baseline system.

Collaboration


Dive into the Jielin Pan's collaboration.

Top Co-Authors

Avatar

Yonghong Yan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Ta Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Qingqing Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yujing Si

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Qingwei Zhao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Shang Cai

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yeming Xiao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhen Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Changchun Bao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Weiqun Xu

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge