Hanna Silén
Tampere University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hanna Silén.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Elina Helander; Hanna Silén; Tuomas Virtanen; Moncef Gabbouj
A drawback of many voice conversion algorithms is that they rely on linear models and/or require a lot of tuning. In addition, many of them ignore the inherent time-dependency between speech features. To address these issues, we propose to use dynamic kernel partial least squares (DKPLS) technique to model nonlinearities as well as to capture the dynamics in the data. The method is based on a kernel transformation of the source features to allow non-linear modeling and concatenation of previous and next frames to model the dynamics. Partial least squares regression is used to find a conversion function that does not overfit to the data. The resulting DKPLS algorithm is a simple and efficient algorithm and does not require massive tuning. Existing statistical methods proposed for voice conversion are able to produce good similarity between the original and the converted target voices but the quality is usually degraded. The experiments conducted on a variety of conversion pairs show that DKPLS, being a statistical method, enables successful identity conversion while achieving a major improvement in the quality scores compared to the state-of-the-art Gaussian mixture-based model. In addition to enabling better spectral feature transformation, quality is further improved when aperiodicity and binary voicing values are converted using DKPLS with auxiliary information from spectral features.
international conference on acoustics, speech, and signal processing | 2012
Victor Popa; Hanna Silén; Jani Nurminen; Moncef Gabbouj
Many popular approaches to spectral conversion involve linear transformations determined for particular acoustic classes and compute the converted result as a linear combination between different local transformations in an attempt to ensure a continuous conversion. These methods often produce over-smoothed spectra and parameter tracks. The proposed method computes an individual linear transformation for every feature vector based on a small neighborhood in the acoustic space thus preserving local details. The method effectively reduces the over-smoothing by eliminating undesired contributions from acoustically remote regions. The method is evaluated in listening tests against the well-known Gaussian Mixture Model based conversion, representative of the class of methods involving linear transformations. Perceptual results indicate a clear preference for the proposed scheme.
conference of the international speech communication association | 2016
Xavi Gonzalvo; Siamak Tazari; Chun-an Chan; Markus Becker; Alexander Gutkin; Hanna Silén
This paper presents advances in Google’s hidden Markov model (HMM)-driven unit selection speech synthesis system. We describe several improvements to the run-time system; these include minimal latency, high-quality and fast refresh cycle for new voices. Traditionally unit selection synthesizers are limited in terms of the amount of data they can handle and the real applications they are built for. That is even more critical for reallife large-scale applications where high-quality is expected and low latency is required given the available computational resources. In this paper we present an optimized engine to handle a large database at runtime, a composite unit search approach for combining diphones and phrase-based units. In addition a new voice building strategy for handling big databases and keeping the building times low is presented.
international symposium on circuits and systems | 2013
Jani Nurminen; Hanna Silén; Elina Helander; Moncef Gabbouj
Speech parameterization remains an open question in statistical speech synthesis. In our earlier work we have shown that a framework developed originally for highly efficient speech storage can also be successfully applied for voice conversion and concatenative unit selection based speech synthesis. Recently, we have also used the same coding scheme in hybrid-form speech synthesis. In this paper, we further discuss the framework and apply it in statistical speech synthesis, concentrating specifically on the spectral modeling of the linear prediction (LP) residual. Perceptual evaluation demonstrates that the modeling of the spectral details remaining in the residual improves the quality of synthetic speech.
conference of the international speech communication association | 2009
Hanna Silén; Elina Helander; Jani Nurminen; Moncef Gabbouj
conference of the international speech communication association | 2012
Hanna Silén; Elina Helander; Jani Nurminen; Moncef Gabbouj
conference of the international speech communication association | 2008
Elina Helander; Jan Schwarz; Jani Nurminen; Hanna Silén; Moncef Gabbouj
conference of the international speech communication association | 2014
Gerard Sanchez; Hanna Silén; Jani Nurminen; Moncef Gabbouj
conference of the international speech communication association | 2013
Hanna Silén; Jani Nurminen; Elina Helander; Moncef Gabbouj
conference of the international speech communication association | 2011
Hanna Silén; Elina Helander; Moncef Gabbouj