David Janiszek
University of Avignon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Janiszek.
IEEE Transactions on Speech and Audio Processing | 2003
Yannick Estève; Christian Raymond; R. De Mori; David Janiszek
This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanation-based learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.
international conference on acoustics, speech, and signal processing | 2001
David Janiszek; R. De Mori; E. Bechet
A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram language model (LM) This method is based on numerical distances in a reduced space obtained by singular value decomposition. Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a word error rate (WER) reduction of 6.5%. By further interpolating augmented counts with the counts extracted from a very large newspaper corpus, but only for selected histories, a total WER reduction of 11.7% was obtained. We show that this approach gives better results than a global count interpolation for all histories of the LM.
Pattern Recognition Letters | 2004
Frédéric Béchet; R. De Mori; David Janiszek
A new augmentation method for counts to be used in language modeling is presented. It is based on word representations in a reduced space obtained with Singular Value Decomposition. A contribution to a count for a linguistic event x is obtained from the counts of observed events smoothed with a function of their distance from x. Experimental results on a spoken dialogue corpus show the performance of the proposed method, combined with maximum a posteriori probability adaptation, in terms of word error rate reduction.
ieee international conference on fuzzy systems | 2011
Julie Mauclair; Laurent Wendling; David Janiszek
This paper presents a study on merging confidence measures using fuzzy logic. Instead of the previous approaches using the notion of probability, we propose to observe the uncertainty of the recognition hypotheses and the notion of possibility thanks to fuzzy reasoning. Four different confidence measures are developed, coming from different parts of a speech recognizer. Various merging methods are studied to improve the performance of the confidence measures. The methods are evaluated in terms of Confidence Error Rate (CER) and in terms of their Detection Error Tradeoff (DET) curves on a French broadcast news corpus. They are compared to some fuzzy logic aggregation techniques among which the technique based on the Choquet Integral yields to a significant improvement in terms of CER.
international conference on tools with artificial intelligence | 2013
Adrien Dulac; Damien Pellier; Humbert Fiorino; David Janiszek
Sciences et technologies de l'information et de la communication en milieu éducatif : Analyse de pratiques et enjeux didactiques. | 2011
David Janiszek; Laetitia Boulc'H; Damien Pellier; Julie Mauclair; Georges-Louis Baron
Sciences et Technologies | 2011
David Janiszek; Damien Pellier; Julie Mauclair; Laetitia Boulc'H; Georges-Louis Baron; Yannick Parchemal
International Technology, Education and Development Conference | 2011
David Janiszek; Damien Pellier; Julie Mauclair; Georges-Louis Baron; Y. Parchemal
conference of the international speech communication association | 2000
David Janiszek; Frédéric Béchet; Renato De Mori
7ème Colloque Technologies de l'Information et de la Communication pour l'Enseignement | 2010
David Janiszek; Damien Pellier; Julie Mauclair; Y. Parchemal; Georges-Louis Baron