Sabine Buchholz
Toshiba
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sabine Buchholz.
conference on computational natural language learning | 2006
Sabine Buchholz; Erwin Marsi
Each year the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. In this paper, we describe how treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured. We also give an overview of the parsing approaches that participants took and the results that they achieved. Finally, we try to draw general conclusions about multi-lingual parsing: What makes a particular language, treebank or annotation scheme easier or harder to parse and which phenomena are challenging for any dependency parser?
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Heiga Zen; Norbert Braunschweiler; Sabine Buchholz; Mark J. F. Gales; Katherine Mary Knill; Sacha Krstulovic; Javier Latorre
An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.
international conference on acoustics, speech, and signal processing | 2012
Florian Eyben; Sabine Buchholz; Norbert Braunschweiler
Current text-to-speech synthesis (TTS) systems are often perceived as lacking expressiveness, limiting the ability to fully convey information. This paper describes initial investigations into improving expressiveness for statistical speech synthesis systems. Rather than using hand-crafted definitions of expressive classes, an unsupervised clustering approach is described which is scalable to large quantities of training data. To incorporate this “expression cluster” information into an HMM-TTS system two approaches are described: cluster questions in the decision tree construction; and average expression speech synthesis (AESS) using cluster-based linear transform adaptation. The performance of the approaches was evaluated on audiobook data in which the reader exhibits a wide range of expressiveness. A subjective listening test showed that synthesising with AESS results in speech that better reflects the expressiveness of human speech than a baseline expression-independent system.
international conference on acoustics, speech, and signal processing | 2011
Javier Latorre; Mark J. F. Gales; Sabine Buchholz; Katherine Mary Knill; Masatsune Tamura; Yamato Ohtani; Masami Akamine
Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model.
conference of the international speech communication association | 2010
Norbert Braunschweiler; Mark J. F. Gales; Sabine Buchholz
conference of the international speech communication association | 2011
Sabine Buchholz; Javier Latorre
conference of the international speech communication association | 2011
Norbert Braunschweiler; Sabine Buchholz
conference of the international speech communication association | 2005
Tommy Ingulfsen; Tina Burrows; Sabine Buchholz
Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment | 2013
Sabine Buchholz; Javier Latorre; Kayoko Yanagisawa
SSW | 2007
Tanya Lambert; Norbert Braunschweiler; Sabine Buchholz