Xiaoying Xu
Beijing Normal University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoying Xu.
Speech Communication | 2015
Ya Li; Jianhua Tao; Keikichi Hirose; Xiaoying Xu; Wei Lai
Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macro- and microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems
affective computing and intelligent interaction | 2009
Xiaoying Xu; Ya Li; Liping Hu; Jianhua Tao
The paper proposes three principles and a clear guideline to create a large-scale Chinese sentiment lexicon for opinion mining systems. The comparative analysis between the criterions used in ready-made resource and the Subjectivity and Polarity in opinion mining system is also discussed in the paper. With the analysis results, we manually categorize 116533 entries of Hownet terms for the lexicon according to the guideline. Two experiments are conducted to investigate the reliability of manual subjectivity labeling of the terms. The result of the first experiment shows significant high agreement between different annotators, while the second experiment demonstrates the effectiveness of our lexicon in judging the polarity of subjective sentence considering the simple recognition method and verifying our approachs high reliability.
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014
Wei Lai; Xiaoying Xu; Ya Li; Hao Che; Shanfeng Liu; Jianhua Tao
Despite the discovery of final lowering effect in widespread language, its origin and realization in different phonological environments still needs exploration. In this article, with a large dialogue corpus, three experiments are conducted to examine how phonological factors (such as prosodic units, sentence stresses and boundary pitch movement) would influence the realization of final lowering in Chinese Mandarin. The results show that: I) The bearing unit of final lowering in Chinese is the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range in a physiological way. II) The position of the sentence stress has an influence on the presence/absence of final lowering. To be specific, final lowering tends to be triggered by sentence stresses on the penultimate and last third prosodic word, and suppressed by sentences stresses prior to the last third prosodic word. III) Final lowering effect would be pushed leftward by sentence stresses and high boundary tones in final positions. This article lends support to the phonological origin of final lowering, and introduces a cross-linguistic framework of prosodic structure to analyze its specific realization under different conditions of stress positions and boundary pitch movements.
international symposium on chinese spoken language processing | 2014
Xin Xu; Ya Li; Xiaoying Xu; Zhengqi Wen; Hao Che; Shanfeng Liu; Jianhua Tao
Increasing attention has been paid to the study of the emotional content of speech signals recently. Most studies focus on classification schemes and few of them concentrate on the selection of suitable features for speech emotion recognition. Current studies still encounter many bottlenecks in both speech emotion recognition and emotional speech synthesis because of the lack of the fine modeling of emotions. This paper takes two basic emotions including happiness and sadness as an example to explore the fine modeling of emotions, aiming at providing necessary phonetic clues for speech emotion recognition and emotional speech synthesis. 33 acoustic features are selected to form the original feature set based on the comprehensive feature analysis in the first step. The best performing feature subset is chosen according to the recognition experiments and it will verify the results of the comprehensive feature analysis. The experimental results show that the emotion recognition model with features set we final select achieves higher classification accuracy than other features set and the smaller data of features set can reduce model complexity. The importance of each acoustic feature is also analyzed in this paper.
2012 International Conference on Speech Database and Assessments | 2012
Xiaoying Xu; Ya Li; Jianhua Tao; Xuefei Liu
In recent years, both metaphor interpretation and opinion mining have drawn much attention in the natural language processing (NLP) field. This paper aims to make a connection between these two fields. In this paper, we propose to extend the glossary orientation annotation to the vehicle (the source concept part of the metaphor) by using an automatic annotation method, and based on the vehicles orientation corpus, we parse the metaphors polarity after extracting the metaphor(especially the simile)from the large-scale corpus. Two experiments are conducted to investigate the reliability of our proposal. The result of the first experiment shows the proposed method obtains better results than the system we proposed in 2009 in both precision and recall, while the result of second experiment shows that more than 65% metaphors have a very stable sentiment orientation. Generally, the results demonstrate the effectiveness of our approach and verifying our approachs high reliability.
Speech Communication | 2017
Ya Li; Jianhua Tao; Wei Lai; Xiaoying Xu
Previous intonational research on Mandarin has mainly focused on the prosody modeling of statements or the prosody analysis of interrogative sentences. To support related speech technologies, e.g., Text-to-Speech, the quantitative modeling of intonation of interrogative sentences with a large-scale corpus still deserves attention. This paper summarizes our work on the quantitative prosody modeling of interrogative sentence in Mandarin. A large-scale natural speech corpus was used in this study. By extracting the pitch contours and fitting the intonation curves, we found that F0 declination and final lowering both existed in interrogative sentences, while they were claimed to be absent in Mandarin in some previous studies. In addition, the declination function could be modeled linearly, and the bearing unit of final lowering in Mandarin was found to be the last prosodic word in the utterance, regardless of its length, rather than a fixed duration range. It was argued in this study that the difference between this finding and the commonly believed rising intonation of the interrogative sentences resulted from the nonlinear relationship between prosody production and perception. The underlying mechanism for the existence of F0 declination and final lowering in interrogative sentences is also discussed.
international symposium on chinese spoken language processing | 2016
Wei Lai; Mark Liberman; Jiahong Yuan; Xiaoying Xu
This study explored word-level prosodic strength in Mandarin Chinese reflected by tone reduction on the second syllables in Tone4+Tone4 words, by examining the slope difference between the two consecutive tones as an indicator for tonal reduction. It was found that firstly, the occurrence of tonal reduction is dependent on the internal structure of the word: words formed by apposition, (pseudo-)suffixation and replication were more vulnerable to tonal reduction than verb-object words and loanwords were. Secondly, with regards to language proficiency, higher proficiency speakers do larger amount of reduction than lower proficiency speakers do. Thirdly, the prosodic position has an asymmetrical influence on tone realization for words susceptible or unsusceptible to tone reduction: words susceptible to reduction shows heavier reduction in utterance-final positions than in non-final positions, while words not susceptible to tonal reduction do not show heavier reduction in utterance-final positions, and even show slight strengthening on the second tone in some cases. It is argued that tone reduction in T4+T4 words reflected the existence of fine-grained degrees of prosodic strength intrinsic to lexical items.
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing | 2015
Jiahong Yuan; Xiaoying Xu; Wei Lai; Weiping Ye; Xinru Zhao; Mark Liberman
A central problem in research on automatic proficiency scoring is to differentiate the variability between and within groups of standard and non-standard speakers. Along with the effort to improve the robustness of techniques and models, we can also select test sentences that are more reliable for measuring the between-group variability. This study demonstrated that the performance of an automatic scoring system could be significantly improved by excluding “bad” sentences from the scoring procedure. The experiments on a dataset of Putonghua Shuiping Ceshi (Mandarin proficiency test) showed that, compared to all available sentences, using only best-performed sentences improved the speaker-level correlation between human and automatic scores from r = .640 to r = .824.
international symposium on chinese spoken language processing | 2014
Xiaoying Xu; Huimin Wang; Ya Li; Wei Lai; Jianhua Tao
The paper introduced an audiobook Pingfan de Shijie with plentiful information of emotions. Via the emotion labeling of more than 6000 sentences respectively depending on context and audio, the commonalities and differences of the two layers of annotation were checked from two aspects: whether the sentences are neutral or un-neutral, and which emotional categories these sentences belong to. The results show that the subjective sentences in text layer can be read as neutral speech sentences, while large amount of objective sentences in text layer must be read as effective speech sentences. Finally, the causes to the annotation difference were analyzed and considerations in the process of emotional modeling from text to speech in speech synthesis were raised by this paper.
workshop on chinese lexical semantics | 2013
Xiaoying Xu; Jianhua Tao; Ya Li
Affective language computing has drawn considerable interest in natural language processing area, multiple domains have been developed in this area. Constructing an emotion corpus plays a fundamental role in affective language processing. Large amounts of recent works have tackled this issue and several emotion lexicons had been established, however, the task of the lexicon had not been examined clearly and little attention had been paid to clarify the role of the emotion word applied in different tasks. In this paper, three basic issues on establishing a Task-oriental subjectivity lexicon for multiple application domains, i.e., the principle of collecting and annotating the emotion terms in different tasks, the models of the emotion theory and the composition of isolated and contextual-based terms are discussed in-depth. Finally, the method for building this lexicon is also examined.