Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pierre Lanchantin is active.

Publication


Featured researches published by Pierre Lanchantin.


Signal Processing | 2011

Unsupervised segmentation of randomly switching data hidden with non-Gaussian correlated noise

Pierre Lanchantin; Jérôme Lapuyade-Lahorgue; Wojciech Pieczynski

Hidden Markov chains (HMC) are a very powerful tool in hidden data restoration and are currently used to solve a wide range of problems. However, when these data are not stationary, estimating the parameters, which are required for unsupervised processing, poses a problem. Moreover, taking into account correlated non-Gaussian noise is difficult without model approximations. The aim of this paper is to propose a simultaneous solution to both of these problems using triplet Markov chains (TMC) and copulas. The interest of the proposed models and related processing is validated by different experiments some of which are related to semi-supervised and unsupervised image segmentation.


spoken language technology workshop | 2012

Transcription of multi-genre media archives using out-of-domain data

Peter Bell; Mark J. F. Gales; Pierre Lanchantin; Xunying Liu; Yanhua Long; Steve Renals; Pawel Swietojanski; Philip C. Woodland

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features.


international conference on acoustics, speech, and signal processing | 2010

A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method

Pierre Lanchantin; Gilles Degottex; Xavier Rodet

This paper introduces a HMM-based speech synthesis system which uses a new method for the Separation of Vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. The synthesis results are finally discussed and subjectively evaluated.


Speech Communication | 2013

Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Gilles Degottex; Pierre Lanchantin; Axel Roebel; Xavier Rodet

In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition.


ieee automatic speech recognition and understanding workshop | 2015

Cambridge university transcription systems for the multi-genre broadcast challenge

Philip C. Woodland; Xunying Liu; Yanmin Qian; Chao Zhang; Mark J. F. Gales; Penny Karanasou; Pierre Lanchantin; Linlin Wang

We describe the development of our speech-to-text transcription systems for the 2015 Multi-Genre Broadcast (MGB) challenge. Key features of the systems are: a segmentation system based on deep neural networks (DNNs); the use of HTK 3.5 for building DNN-based hybrid and tandem acoustic models and the use of these models in a joint decoding framework; techniques for adaptation of DNN based acoustic models including parameterised activation function adaptation; alternative acoustic models built using Kaldi; and recurrent neural network language models (RNNLMs) and RNNLM adaptation. The same language models were used with both HTK and Kaldi acoustic models and various combined systems built. The final systems had the lowest error rates on the evaluation data.


ieee automatic speech recognition and understanding workshop | 2015

The development of the cambridge university alignment systems for the multi-genre broadcast challenge

Pierre Lanchantin; Mark J. F. Gales; Penny Karanasou; Xunying Liu; Yanmin Qian; Linlin Wang; Philip C. Woodland; Chao Zhang

We describe the alignment systems developed both for the preparation of data for the Multi-Genre Broadcast (MGB) challenge and for our participation in the transcription and alignment tasks. Captions of varying quality are aligned with the audio of TV shows that range from few minutes long to more than six hours. Lightly supervised decoding is performed on the audio and the output text is aligned with the original text transcript. Reliable split points are found and the resulting text chunks are force-aligned with the corresponding audio segments. Confidence scores are associated with the aligned data. Multiple refinements - including audio segmentation based on deep neural networks (DNNs) and the use of DNN-based acoustic models - were used to improve the performance. The final MGB alignment system had the highest F-measure value on the evaluation data.


ieee automatic speech recognition and understanding workshop | 2015

Speaker diarisation and longitudinal linking in multi-genre broadcast data

Penny Karanasou; Mark J. F. Gales; Pierre Lanchantin; Xunying Liu; Yanmin Qian; Linlin Wang; Philip C. Woodland; Chao Zhang

This paper presents a multi-stage speaker diarisation system with longitudinal Linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation output aiming at the identification of speakers across multiple episodes of the same series. The longitudinal constraint imposes an incremental processing of the episodes, where speaker labels for each episode can be obtained using only material from the episode in question, and those broadcast earlier in time. The nature of the data as well as the longitudinal linking constraint position this diarisation task as a new open-research topic, and a particularly challenging one. Different linking clustering metrics are compared and the lowest within-episode and cross-episode DER scores are achieved on the MGB challenge evaluation set.


international conference on acoustics, speech, and signal processing | 2016

Improved DNN-based segmentation for multi-genre broadcast audio

Linlin Wang; Chao Zhang; Philip C. Woodland; Mark J. F. Gales; Panagiota Karanasou; Pierre Lanchantin; Xunying Liu; Yanmin Qian

Automatic segmentation is a crucial initial processing step for processing multi-genre broadcast (MGB) audio. It is very challenging since the data exhibits a wide range of both speech types and background conditions with many types of non-speech audio. This paper describes a segmentation system for multi-genre broadcast audio with deep neural network (DNN) based speech/non-speech detection. A further stage of change-point detection and clustering is used to obtain homogeneous segments. Suitable DNN inputs, context window sizes and architectures are studied with a series of experiments using a large corpus of MGB television audio. For MGB transcription, the improved segmenter yields roughly half the increase in word error rate, over manual segmentation, compared to the baseline DNN segmenter supplied for the 2015 ASRU MGB challenge.


international conference on acoustics, speech, and signal processing | 2011

Objective evaluation of the Dynamic Model Selection method for spectral voice conversion

Pierre Lanchantin; Xavier Rodet

Spectral voice conversion is usually performed using a single model selected in order to represent a tradeoff between goodness of fit and complexity. Recently, we proposed a new method for spectral voice conversion, called Dynamic Model Selection (DMS), in which we assumed that the model topology may change over time, depending on the source acoustic features. In this method a set of models with increasing complexity is considered during the conversion of a source speech signal into a target speech signal. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each source frame. In this paper, we present an objective evaluation demonstrating that this new method improves the conversion by reducing the transformation error compared to methods based on an single model.


international conference on acoustics, speech, and signal processing | 2014

Multiple-average-voice-based speech synthesis

Pierre Lanchantin; Mark J. F. Gales; Simon King; Junichi Yamagishi

This paper describes a novel approach for the speaker adaptation of statistical parametric speech synthesis systems based on the interpolation of a set of average voice models (AVM). Recent results have shown that the quality/naturalness of adapted voices depends on the distance from the average voice model used for speaker adaptation. This suggests the use of several AVMs trained on carefully chosen speaker clusters from which a more suitable AVM can be selected/interpolated during the adaptation. In the proposed approach a set of AVMs, a multiple-AVM, is trained on distinct clusters of speakers which are iteratively re-assigned during the estimation process initialised according to metadata. During adaptation, each AVM from the multiple-AVM is first adapted towards the target speaker. The adapted means from the AVMs are then interpolated to yield the final speaker adapted mean for synthesis. It is shown, performing speaker adaptation on a corpus of British speakers with various regional accents, that the quality/naturalness of synthetic speech of adapted voices is significantly higher than when considering a single factor-independent AVM selected according to the target speaker characteristics.

Collaboration


Dive into the Pierre Lanchantin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xunying Liu

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chao Zhang

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar

Yanmin Qian

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar

Linlin Wang

University of Cambridge

View shared research outputs
Researchain Logo
Decentralizing Knowledge