Yangyang Shi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yangyang Shi is active.

Explore More

Publication

Featured researches published by Yangyang Shi.

ieee automatic speech recognition and understanding workshop | 2013

K-component recurrent neural network language models using curriculum learning

Yangyang Shi; Martha Larson; Catholijn M. Jonker

Conventional n-gram language models are known for their limited ability to capture long-distance dependencies and their brittleness with respect to within-domain variations. In this paper, we propose a k-component recurrent neural network language model using curriculum learning (CL-KRNNLM) to address within-domain variations. Based on a Dutch-language corpus, we investigate three methods of curriculum learning that exploit dedicated component models for specific sub-domains. Under an oracle situation in which context information is known during testing, we experimentally test three hypotheses. The first is that domain-dedicated models perform better than general models on their specific domains. The second is that curriculum learning can be used to train recurrent neural network language models (RNNLMs) from general patterns to specific patterns. The third is that curriculum learning, used as an implicit weighting method to adjust the relative contributions of general and specific patterns, outperforms conventional linear interpolation. Under the condition that context information is unknown during testing, the CL-KRNNLM also achieves improvement over conventional RNNLM by 13% relative in terms of word prediction accuracy. Finally, the CL-KRNNLM is tested in an additional experiment involving N-best rescoring on a standard data set. Here, the context domains are created by clustering the training data using Latent Dirichlet Allocation and k-means clustering.

text speech and dialogue | 2011

Combining topic specific language models

Yangyang Shi; Pascal Wiggers; Catholijn M. Jonker

In this paper we investigate whether a combination of topic specific language models can outperform a general purpose language model, using a trigram model as our baseline model. We show that in the ideal case -- in which it is known beforehand which model to use -- specific models perform considerably better than the baseline model. We test two methods that combine specific models and show that these combinations outperform the general purpose model, in particular if the data is diverse in terms of topics and vocabulary. Inspired by these findings, we propose to combine a decision tree and a set of dynamic Bayesian networks into a new model. The new model uses context information to dynamically select an appropriate specific model.

ieee automatic speech recognition and understanding workshop | 2011

Socio-situational setting classification based on language use

Yangyang Shi; Pascal Wiggers; Catholijn M. Jonker

We present a method for automatic classification of the socio-situational setting of a conversation based on the language used. The socio-situational setting depicts the social background of a conversation which involves the communicative goals, number of speakers, number of listeners and the relationship among the speakers and the listeners. Knowledge of the socio-situational setting can be used to search for content recorded in a particular setting or to select context-dependent models for example for speech recognition. We investigated the performance of different feature sets of conversation level features and word level features and their combinations on this task. Our final system, that classifies the conversations in the Spoken Dutch Corpus in one of 14 socio-situational settings, achieves an accuracy of 89.55%.

text speech and dialogue | 2012

Adaptive Language Modeling with a Set of Domain Dependent Models

Yangyang Shi; Pascal Wiggers; Catholijn M. Jonker

An adaptive language modeling method is proposed in this paper. Instead of using one static model for all situations, it applies a set of specific models to dynamically adapt to the discourse. We present the general structure of the model and the training procedure. In our experiments, we instantiated the method with a set of domain dependent models which are trained according to different socio-situational settings (almosd). We compare it with previous topic dependent and socio-situational setting dependent adaptive language models and with a smoothed n-gram model in terms of perplexity and word prediction accuracy. Our experiments show that almosd achieves perplexity reductions up to almost 12% compared with the other models.

Speech Communication | 2015

Integrating meta-information into recurrent neural network language models

Yangyang Shi; Martha Larson; Joris Pelemans; Catholijn M. Jonker; Patrick Wambacq; Pascal Wiggers; Kris Demuynck

Recurrent neural network language models benefit from the integration of meta-information.Meta-information of various types, word-level, discourse-level, and intrinsic, is valuable.If meta-information can be robustly predicted, it has potential to improve performance.Intrinsic meta-information, word- and sentence-length, is trivial to obtain, but still useful.Approach is validated with the Spoken Dutch Corpus and the Wall Street Journal data set. Due to their advantages over conventional n-gram language models, recurrent neural network language models (rnnlms) recently have attracted a fair amount of research attention in the speech recognition community. In this paper, we explore one advantage of rnnlms, namely, the ease with which they allow the integration of additional knowledge sources. We concentrate on features that provide complementary information w.r.t. the lexical identities of the words. We refer to such information as meta-information. We single out three cases and investigate their merits by means of N-best list re-scoring experiments on a challenging corpus of spoken Dutch (referred to as cgn) as well as on the English Wall Street Journal (wsj) corpus. First, we look at Parts of Speech (POS) tags and lemmas, two sources of word-level linguistic information that are known to make a contribution to the performance of conventional language models. We confirm that rnnlms can benefit from these sources as well. Second, we investigate socio-situational settings (ssss) and topics, two sources of discourse-level information that are also known to benefit language models. ssss are present in the cgn data, and can be seen as a proxy for the language register. For the purposes of our investigation, we assume that information on the sss can be captured at the moment at which speech is recorded. Topics, i.e., treatments of different subjects, are present in the wsj data. In order to predict POS, lemmas, sss and topic, a second rnnlm is coupled to the main rnnlm. We refer to this architecture as a recurrent neural network tandem language model (rnntlm). Our experimental findings show that if high-quality meta-information labels are available, both word-level and discourse-level information improve performance of language models. Third, we investigate sentence length and word length (i.e., token size), two sources of intrinsic information that are readily available for exploitation because they are known at the time of re-scoring. Intrinsic information has been largely overlooked by language modeling research. The results of both experiments on cgn data and wsj data show that integrating sentence length and word length can achieve improvement. rnnlms allow these features to be incorporated with ease, and obtain improved performance.

Speech Communication | 2013

Classifying the socio-situational settings of transcripts of spoken discourses

Yangyang Shi; Pascal Wiggers; Catholijn M. Jonker

In this paper, we investigate automatic classification of the socio-situational settings of transcripts of a spoken discourse. Knowledge of the socio-situational setting can be used to search for content recorded in a particular setting or to select context-dependent models for example in speech recognition. The subjective experiment we report on in this paper shows that people correctly classify 68% the socio-situational settings. Based on the cues that participants mentioned in the experiment, we developed two types of automatic socio-situational setting classification methods; a static socio-situational setting classification method using support vector machines (s3c-svm), and a dynamic socio-situational classification method applying dynamic Bayesian networks (s3c-dbn). Using these two methods, we developed classifiers applying various features and combinations of features. The s3c-svm method with sentence length, function word ratio, single occurrence word ratio, part of speech (pos) and words as features results in a classification accuracy of almost 90%. Using a bigram s3c-dbn with pos tag and word features results in a dynamic classifier which can obtain nearly 89% classification accuracy. The dynamic classifiers not only can achieve similar results as the static classifiers, but also can track the socio-situational setting while processing a transcript or conversation. On discourses with a static social situational setting, the dynamic classifiers only need the initial 25% of data to achieve a classification accuracy close to the accuracy achieved when all data of a transcript is used.

text speech and dialogue | 2013

K-Component Adaptive Recurrent Neural Network Language Models

Yangyang Shi; Martha Larson; Pascal Wiggers; Catholijn M. Jonker

Conventional n-gram language models for automatic speech recognition are insufficient in capturing long-distance dependencies and brittle with respect to changes in the input domain. We propose a k-component recurrent neural network language model (karnnlm) that addresses these limitations by exploiting the long-distance modeling ability of recurrent neural networks and by making use of k different sub-models trained on different contextual domains. Our approach uses Latent Dirichlet Allocation to automatically discover k subsets of the training data, that are used to train k component models. Our experiments first use a Dutch-language corpus to confirm the ability of karnnlm to automatically choose the appropriate component. Then, we use a standard benchmark set (Wall Street Journal) to perform N-best list rescoring experiments. Results show that karnnlm improves performance over the rnnlm baseline; the best performance is achieved when karnnlm is combined with the general model using a novel iterative alternating N-best rescoring strategy.

international conference on acoustics, speech, and signal processing | 2012

Dynamic Bayesian socio-situational setting classification

Yangyang Shi; Pascal Wiggers; Catholijn M. Jonker

We propose a dynamic Bayesian classifier for the socio-situational setting of a conversation. Knowledge of the socio-situational setting can be used to search for content recorded in a particular setting or to select context-dependent models in speech recognition. The dynamic Bayesian classifier has the advantage - compared to static classifiers such a naive Bayes and support vector machines - that it can continuously update the classification during a conversation. We experimented with several models that use lexical and part-of-speech information. Our results show that the prediction accuracy of the dynamic Bayesian classifier using the first 25% of a conversation is almost 98% of the final prediction accuracy, which is calculated on the entire conversation. The best final prediction accuracy, 88.85%, is obtained by bigram dynamic Bayesian classification using words and part-of-speech tags.

conference of the international speech communication association | 2013