Zvi Kons
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zvi Kons.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Stas Tiomkin; David Malah; Slava Shechtman; Zvi Kons
Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in the stored data, and audible discontinuities may result. On the other hand, statistical TTS (STTS) systems, in spite of having a smaller footprint than CTTS, synthesize speech that is free of such discontinuities. Yet, in general, STTS produces lower quality speech than CTTS, in terms of naturalness, as it is often sounding muffled. The muffling effect is due to over-smoothing of model-generated speech features. In order to gain from the advantages of each of the two approaches, we propose in this work to combine CTTS and STTS into a hybrid TTS (HTTS) system. Each utterance representation in HTTS is constructed from natural segments and model generated segments in an interweaved fashion via a hybrid dynamic path algorithm. Reported listening tests demonstrate the validity of the proposed approach.
Journal of the Acoustical Society of America | 2007
Dan Chazan; Zvi Kons
A speech decoder and a segment aligner are provided in the present invention. The speech decoder may include a spectrum reconstructor operative to reconstruct the spectrum of a speech segment from the amplitude envelope of the spectrum of said speech segment and pitch information, a phase combiner operative to reconstruct the complex spectrum of the speech segment from the reconstructed spectrum, phase information describing the speech segment, and pitch information describing the speech segment. The speech decoder may further include a delay operative to store a complex spectrum of a previous speech segment; and a segment aligner operative to determine the relative offset between the complex spectrum of the speech segment and the complex spectrum of the previous speech segment, align the position of the first pitch excitation of the current speech segment to the last pitch excitation of the previous speech segment; and to apply a time shift and a complex Hilbert filter to said complex spectra, wherein the segment aligner is operative to cross-correlate the complex spectra as C ( τ ) = ∑ n = 0 N F n G _ m ⅇ - 2 π in τ , m = ⌊ n p G p F + 0.5 ⌋ , where Fn and Gm are the computed complex magnitude of the pitch harmonics n and m of the current and previous spectra respectively, and pF and pG are their corresponding pitch periods.
conference of the international speech communication association | 2013
Zvi Kons; Orith Toledo-Ronen
conference of the international speech communication association | 2005
Dan Chazan; Ron Hoory; Zvi Kons; Ariel Sagi; Slava Shechtman; Alexander Sorin
Archive | 2005
Dan Chazan; Ron Hoory; Zvi Kons; Slava Shechtman; Alexander Sorin
Archive | 2008
Raul Fernandez; Zvi Kons; Slava Shechtman; Zhi Wei Shuang; Ron Hoory; Bhuvana Ramabhadran; Yong Qin
Archive | 2011
Shay Ben-David; Ron Hoory; Zvi Kons; David Nahamoo
Archive | 2002
Dan Chazan; Zvi Kons
Archive | 2006
Ellen Eide; Raul Fernandez; Ron Hoory; Wael Hamza; Zvi Kons; Michael Picheny; Ariel Sagi; Slava Shechtman; Zhi Wei Shuang
Archive | 2017
Zvi Kons