Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Slava Shechtman is active.

Publication


Featured researches published by Slava Shechtman.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units

Stas Tiomkin; David Malah; Slava Shechtman; Zvi Kons

Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in the stored data, and audible discontinuities may result. On the other hand, statistical TTS (STTS) systems, in spite of having a smaller footprint than CTTS, synthesize speech that is free of such discontinuities. Yet, in general, STTS produces lower quality speech than CTTS, in terms of naturalness, as it is often sounding muffled. The muffling effect is due to over-smoothing of model-generated speech features. In order to gain from the advantages of each of the two approaches, we propose in this work to combine CTTS and STTS into a hybrid TTS (HTTS) system. Each utterance representation in HTTS is constructed from natural segments and model generated segments in an interweaved fashion via a hybrid dynamic path algorithm. Reported listening tests demonstrate the validity of the proposed approach.


international conference on acoustics, speech, and signal processing | 2006

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification

Dan Chazan; Ron Hoory; Ariel Sagi; Slava Shechtman; Alexander Sorin; Zhiwei Shuang; Raimo Bakis

This paper describes an efficient sinusoidal modeling framework for high quality wide band (WB) speech synthesis and modification. This technique may serve as a basis for speech compression in the context of small footprint concatenative Text to Speech systems. In addition, it is a useful representation for voice transformation and morphing purposes, e.g., simultaneous pitch modification and spectral envelope warping. The conventional sinusoidal modeling is enhanced with an adaptive frequency dithering mechanism, based on a degree of voicing analysis. Considerable reduction of the amount of model parameters is achieved by high band phase extension. The proposed model is evaluated and compared to the alternative STRAIGHT framework [1]. Being simpler and considerably more efficient than STRAIGHT, it outperforms it in speech quality for both speech reconstruction and transformation.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Statistical Text-to-Speech Synthesis Based on Segment-Wise Representation With a Norm Constraint

Stas Tiomkin; David Malah; Slava Shechtman

In statistical HMM-based text-to-speech systems (STTS), speech feature dynamics is modeled by first- and second-order feature frame differences, which, typically, do not satisfactorily represent frame to frame feature dynamics present in natural speech. The reduced dynamics results in over-smoothing of speech features, often sounding as muffled synthesized speech. In this correspondence, we propose a method to enhance a baseline STTS system by introducing a segment-wise model representation with a norm constraint. The segment-wise representation provides additional degrees of freedom in speech feature determination. We exploit these degrees of freedom for increasing the speech feature vector norm to match a norm constraint. As a result, statistically generated speech features are less over-smoothed, resulting in more natural sounding speech, as judged by listening tests.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Quality Preserving Compression of a Concatenative Text-To-Speech Acoustic Database

Tamar Shoham; David Malah; Slava Shechtman

A concatenative text-to-speech (CTTS) synthesizer requires a large acoustic database for high-quality speech synthesis. This database consists of many acoustic leaves, each containing a number of short, compressed, speech segments. In this paper, we propose two algorithms for recompression of the acoustic database, by recompressing the data in each acoustic leaf, without compromising the perceptual quality of the obtained synthesized speech. This is achieved by exploiting the redundancy between speech frames and speech segments in the acoustic leaf. The first approach is based on a vector polynomial temporal decomposition. The second is based on 3-D shape-adaptive discrete cosine transform (DCT), followed by optimized quantization. In addition we propose a segment ordering algorithm in an attempt to improve overall performance. The developed algorithms are generic and may be applied to a variety of compression challenges. When applied to compressed spectral amplitude parameters of a specific IBM small footprint CTTS database, we obtain a recompression factor of 2 without any perceived degradation in the quality of the synthesized speech.


international conference on acoustics, speech, and signal processing | 2015

Coherent modification of pitch and energy for expressive prosody implantation

Alexander Sorin; Slava Shechtman; Vincent Pollet

In expressive TTS and voice transformation systems, implantation of expressive prosody derived from external out-of-domain sources often leads to extreme pitch modification that compromises the naturalness of the synthesized speech. In this work we investigate and prove a hypothesis that the naturalness loss is in part attributed to a violation of a fundamental relationship between the instantaneous pitch frequency and instantaneous energy of a speech signal. We propose an enhancement for pitch modification where the instantaneous energy is modified coherently with the pitch frequency and demonstrate the potential of this method in a subjective listening evaluation. The proposed approach is complementary to and can be combined with spectrum shape transformation methods for achieving the maximal possible quality of pitch modification.


international symposium on communications control and signal processing | 2010

Footprint reduction of Concatenative Text-To-Speech synthesizers using polynomial temporal decomposition

Tamar Shoham; David Malah; Slava Shechtman

High quality low footprint Concatenative Text-To-Speech (CTTS) synthesizers provide a persistent challenge in the field of speech processing. The spectral parameters representing the short speech segments used in the concatenation process constitute a large portion of the required memory. In this paper we propose to use a vectorial form of Polynomial Temporal Decomposition combined with jointly optimal segmentation and polynomial order selection in order to reduce the storage required for the spectral amplitude parameters by 50%, while preserving the perceptual quality of the obtained synthesized speech.


international conference on acoustics, speech, and signal processing | 2013

Transient modeling for overlap-add sinusoidal model of speech

Slava Shechtman

Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. At most, it reproduces a high quality speech, however for speech transients (e.g. plosives, glottal stops) it suffers from reduced fidelity due to lack of intra-frame modeling of irregularities. Various extensions had been proposed for the stationary sinusoidal model to cope with this problem. One of simple and well-known in the art approaches is incorporating of an intra-frame magnitude envelope into the sinusoidal model. It used to be done by iterative analysis-by-synthesis procedure. In this paper we derive an optimal analytic solution for this problem. We will show that this solution yields significantly better model fit than the known-in-the-art analysis-by-synthesis approach.


international conference on acoustics, speech, and signal processing | 2009

Efficient gradient F0 tree model for prosody modeling and unit-selection, applied for the embedded US English concatenative TTS

Slava Shechtman; Ryuki Tachibana

Modeling of pitch dynamics in addition to absolute pitch modeling is highly desirable for robust pitch curve prediction and unit selection in concatenative TTS systems. Transition prosody models have been reported to improve consistency and naturalness for pitch-accent and tonal languages, like Japanese and Mandarin. In the current work we revise a Gradient F0 tree model, originally developed for Japanese, and adjust it for American English. The resultant model requires few computational resources at a runtime that makes it highly suitable for embedded TTS applications. We report encouraging results of applying it for an embedded concatenative TTS system for American English.


conference of the international speech communication association | 2006

Frequency warping based on mapping formant parameters.

Zhiwei Shuang; Raimo Bakis; Slava Shechtman; Dan Chazan; Yong Qin


conference of the international speech communication association | 2011

Uniform Speech Parameterization for Multi-Form Segment Synthesis.

Alexander Sorin; Slava Shechtman; Vincent Pollet

Collaboration


Dive into the Slava Shechtman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Malah

Technion – Israel Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge