Sabine Buchholz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sabine Buchholz is active.

Explore More

Publication

Featured researches published by Sabine Buchholz.

conference on computational natural language learning | 2006

CoNLL-X Shared Task on Multilingual Dependency Parsing

Sabine Buchholz; Erwin Marsi

Each year the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. In this paper, we describe how treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured. We also give an overview of the parsing approaches that participants took and the results that they achieved. Finally, we try to draw general conclusions about multi-lingual parsing: What makes a particular language, treebank or annotation scheme easier or harder to parse and which phenomena are challenging for any dependency parser?

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Heiga Zen; Norbert Braunschweiler; Sabine Buchholz; Mark J. F. Gales; Katherine Mary Knill; Sacha Krstulovic; Javier Latorre

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.

international conference on acoustics, speech, and signal processing | 2012

Unsupervised clustering of emotion and voice styles for expressive TTS

Florian Eyben; Sabine Buchholz; Norbert Braunschweiler

Current text-to-speech synthesis (TTS) systems are often perceived as lacking expressiveness, limiting the ability to fully convey information. This paper describes initial investigations into improving expressiveness for statistical speech synthesis systems. Rather than using hand-crafted definitions of expressive classes, an unsupervised clustering approach is described which is scalable to large quantities of training data. To incorporate this “expression cluster” information into an HMM-TTS system two approaches are described: cluster questions in the decision tree construction; and average expression speech synthesis (AESS) using cluster-based linear transform adaptation. The performance of the approaches was evaluated on audiobook data in which the reader exhibits a wide range of expressiveness. A subjective listening test showed that synthesising with AESS results in speech that better reflects the expressiveness of human speech than a baseline expression-independent system.

international conference on acoustics, speech, and signal processing | 2011

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Javier Latorre; Mark J. F. Gales; Sabine Buchholz; Katherine Mary Knill; Masatsune Tamura; Yamato Ohtani; Masami Akamine

Most HMM-based TTS systems use a hard voiced/unvoiced classification to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-specific MSD-prior of the F0 models, and/or the frame-specific features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classification, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on the frame-level soft voiced/unvoiced decision of the aperiodicity model.

conference of the international speech communication association | 2010