Feilong Bao
Inner Mongolia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Feilong Bao.
chinese conference on pattern recognition | 2009
Feilong Bao; Guanglai Gao
The research of Mongolian speech recognition technology start comparatively late and it is still in its primary stage. In this paper, we optimized the basic resources of Mongolian speech recognition system, and we also improved the acoustic model of Mongolian speech recognition system, and this is most important. In this paper, we realized continuous HMM Gaussian mixture model and multiple data stream SCHMM model on the basis of context dependent phonetic model and decision tree method. And we compared the two models in performances. Finally, a large quantity of experiments have been taken to the testing set with HTK as an experimental platform by applying trigram language model and acoustic model which is composed of context dependent phonetic model, decision tree method and multiple data stream SCHMM model. We found system performance has been optimized, and system recognition accuracy rates of word and sentence have been greatly improved.
CCL | 2015
Hui Zhang; Feilong Bao; Guanglai Gao
Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.
National Conference on Man-Machine Speech Communication | 2017
Rui Liu; Feilong Bao; Guanglai Gao; Yonghe Wang
Recently, Deep Neural Network (DNN), which is a feed-forward artificial neural network with many hidden layers, has opened a new research direction for Speech Synthesis. It can represent high dimension and correlated features efficiently and model highly complex mapping function compactly. However, the research on DNN-based Mongolian speech synthesis is still in blank filed. This paper applied the DNN-based acoustic model to Mongolian speech synthesis firstly, and built a Mongolian speech synthesis system according to the Mongolian character and acoustic features. Compared with the conventional HMM-based system under the same corpus, the DNN-based system can synthesize better Mongolian speech than HMM-based system can do. The Mean Opinion Score (MOS) of the synthesized Mongolian speech is 3.83. And it becomes a new state-of-the-art system in this field.
international conference on asian language processing | 2016
Rui Liu; Feilong Bao; Guanglai Gao; Weihua Wang
Accurate prosodic phrase prediction can improve the naturalness of speech synthesis. Predicting the prosodic phrase can be regarded as a sequence labeling problem and the Conditional Random Field (CRF) is typically used to solve it. Mongolian is an agglutinative language, in which massive words can be formed by concatenating these stems and suffixes. This character makes it difficult to build a Mongolian prosodic phrase predictions system, based on CRF, that has high performance. We introduce a new method that segments Mongolian word into stem and suffix as individual token. The proposed method integrates multiple features according to the characteristics of Mongolian word formation. We conduct the contrast experiment by selecting the following features: word, multi-level Part-of-Speech (POS), multi-level lexical for suffix and the existence for suffix. The experimental results show that our method has significantly enhanced the performance of the Mongolian prosodic phrase prediction system through comparing with the conventional method that treats Mongolian word as token directly. The word feature, level one lexical for suffix feature and existence for suffix feature are effective. The best result is measured by Fl-measure as 82.49%.
China National Conference on Chinese Computational Linguistics | 2016
Xiaofei Yan; Feilong Bao; Hongxi Wei; Xiangdong Su
In Mongolian language, there is a phenomenon that many words have the same presentation form but represent different words with different codes. Since typists usually input the words according to their representation forms and cannot distinguish the codes sometimes, there are lots of coding errors occurred in Mongolian corpus. It results in statistic and retrieval very difficult on such a Mongolian corpus. To solve this problem, this paper proposed a method which merges the words with same presentation forms by Intermediate characters, then use the corpus in Intermediate characters form to build Mongolian language model. Experimental result shows that the proposed method can reduce the perplexity and the word error rate for the 3-gram language model by 41 % and 30 % respectively when comparing model trained on the corpus without processing. The proposed approach significantly improves the performance of Mongolian language model and greatly enhances the accuracy of Mongolian speech recognition.
National CCF Conference on Natural Language Processing and Chinese Computing | 2017
Yonghe Wang; Feilong Bao; Hongwei Zhang; Guanglai Gao
Deep Neural Network (DNN) model has been achieved a significant result over the Mongolian speech recognition task, however, compared to Chinese, English or the others, there are still opportunities for further enhancements. This paper presents the first application of Feed-forward Sequential Memory Network (FSMN) for Mongolian speech recognition tasks to model long-term dependency in time series without using recurrent feedback. Furthermore, by modeling the speaker in the feature space, we extract the i-vector features and combine them with the Fbank features as the input to validate their effectiveness in Mongolian ASR tasks. Finally, discriminative training was firstly conducted over the FSMN by using maximum mutual information (MMI) and state-level minimum Bayes risk (sMBR), respectively. The experimental results show that: FSMN possesses better performance than DNN in the Mongolian ASR, and by using i-vector features combined with Fbank features as FSMN input and discriminative training, the word error rate (WER) is relatively reduced by 17.9% compared with the DNN baseline.
international conference on asian language processing | 2015
Weihua Wang; Feilong Bao; Guanglai Gao
Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates Mongolian NER system under three methods in the Condition Random Field framework. The experiment shows that segmenting each suffix into an individual token achieves the best performance than both without segmenting and using the suffixes as a feature. Our approach obtains an F-measure = 82.71. It is appropriate for the Mongolian large scale vocabulary NER. This research also makes sense to other agglutinative languages NER systems.
chinese conference on pattern recognition | 2014
Xiangdong Su; Guanglai Gao; Weihua Wang; Feilong Bao; Hongxi Wei
There are many classical Mongolian historical documents which are reserved in image form, and as a result it is inconvenient for us to search and mining the desired content. In order to facilitate the word recognition in the document digitization procedure, this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation. The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour, (2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network. The SCP helps in the baseline locating and segment path generation. Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.
chinese conference on pattern recognition | 2012
Feilong Bao; Guanglai Gao; Yulai Bao; Xiangdong Su
In this paper, we present a baseline spoken term detection (STD) system for Mongolian speech data. Mongolian speech data is recognized as n-best word lattices by the Mongolian ASR system, and n-best word lattices are translated into word confusion networks. Secondly, individual arcs in the word confusion networks are indexed into an efficient inverted index structure. Finally, we search on this index structure to extract keyword candidates, and re-rank the results. Experiments show that this system can get preferable results for the Mongolian in-vocabulary (IV) query terms detection.
pacific rim international conference on artificial intelligence | 2018
Rui Liu; Feilong Bao; Guanglai Gao; Hui Zhang; Yonghe Wang
Phrase break prediction is the first and most important component in increasing naturalness and intelligibility of text-to-speech (TTS) systems. Most works rely on language specific resources, large annotated corpus and feature engineering to perform well. However, phrase break prediction from text for Mongolian speech synthesis is still a great challenge because the data sparse problem due to the scarcity of resources. In this paper, we introduce a Bidirectional Long Short-Term Memory (BiLSTM) model with attention mechanism which uses the position-based enhanced phonological representations, word embeddings and character embeddings to achieve state of the art performance. The position-based enhanced phonological representations, derived from a separately BiLSTM model, are comprised of phoneme and syllable embeddings which take along position information. By using an attention mechanism, the model is able to dynamically decide how much information to use from a word or phonological component. To handle Out-of-Vocabulary (OOV) problem, we incorporated word, phonological and character embeddings together as inputs to the model. Experimental results show the proposed method significantly outperforms the systems which only used the word embeddings by successfully leveraging position-based phonologically information and attention mechanism.