Is this you? Create Your Porfile

Hanna Silén

Tampere University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hanna Silén is active.

Explore More

Publication

Featured researches published by Hanna Silén.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Voice Conversion Using Dynamic Kernel Partial Least Squares Regression

Elina Helander; Hanna Silén; Tuomas Virtanen; Moncef Gabbouj

A drawback of many voice conversion algorithms is that they rely on linear models and/or require a lot of tuning. In addition, many of them ignore the inherent time-dependency between speech features. To address these issues, we propose to use dynamic kernel partial least squares (DKPLS) technique to model nonlinearities as well as to capture the dynamics in the data. The method is based on a kernel transformation of the source features to allow non-linear modeling and concatenation of previous and next frames to model the dynamics. Partial least squares regression is used to find a conversion function that does not overfit to the data. The resulting DKPLS algorithm is a simple and efficient algorithm and does not require massive tuning. Existing statistical methods proposed for voice conversion are able to produce good similarity between the original and the converted target voices but the quality is usually degraded. The experiments conducted on a variety of conversion pairs show that DKPLS, being a statistical method, enables successful identity conversion while achieving a major improvement in the quality scores compared to the state-of-the-art Gaussian mixture-based model. In addition to enabling better spectral feature transformation, quality is further improved when aperiodicity and binary voicing values are converted using DKPLS with auxiliary information from spectral features.

international conference on acoustics, speech, and signal processing | 2012

Local linear transformation for voice conversion

Victor Popa; Hanna Silén; Jani Nurminen; Moncef Gabbouj

Many popular approaches to spectral conversion involve linear transformations determined for particular acoustic classes and compute the converted result as a linear combination between different local transformations in an attempt to ensure a continuous conversion. These methods often produce over-smoothed spectra and parameter tracks. The proposed method computes an individual linear transformation for every feature vector based on a small neighborhood in the acoustic space thus preserving local details. The method effectively reduces the over-smoothing by eliminating undesired contributions from acoustically remote regions. The method is evaluated in listening tests against the well-known Gaussian Mixture Model based conversion, representative of the class of methods involving linear transformations. Perceptual results indicate a clear preference for the proposed scheme.

conference of the international speech communication association | 2016

Recent Advances in Google Real-time HMM-driven Unit Selection Synthesizer

Xavi Gonzalvo; Siamak Tazari; Chun-an Chan; Markus Becker; Alexander Gutkin; Hanna Silén

This paper presents advances in Google’s hidden Markov model (HMM)-driven unit selection speech synthesis system. We describe several improvements to the run-time system; these include minimal latency, high-quality and fast refresh cycle for new voices. Traditionally unit selection synthesizers are limited in terms of the amount of data they can handle and the real applications they are built for. That is even more critical for reallife large-scale applications where high-quality is expected and low latency is required given the available computational resources. In this paper we present an optimized engine to handle a large database at runtime, a composite unit search approach for combining diphones and phrase-based units. In addition a new voice building strategy for handling big databases and keeping the building times low is presented.

international symposium on circuits and systems | 2013

Evaluation of detailed modeling of the LP residual in statistical speech synthesis

Jani Nurminen; Hanna Silén; Elina Helander; Moncef Gabbouj

Speech parameterization remains an open question in statistical speech synthesis. In our earlier work we have shown that a framework developed originally for highly efficient speech storage can also be successfully applied for voice conversion and concatenative unit selection based speech synthesis. Recently, we have also used the same coding scheme in hybrid-form speech synthesis. In this paper, we further discuss the framework and apply it in statistical speech synthesis, concentrating specifically on the spectral modeling of the linear prediction (LP) residual. Perceptual evaluation demonstrates that the modeling of the spectral details remaining in the residual improves the quality of synthetic speech.

conference of the international speech communication association | 2009