Dan Chazan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan Chazan is active.

Explore More

Publication

Featured researches published by Dan Chazan.

IEEE Transactions on Signal Processing | 1991

Super resolution pitch determination of speech signals

Yoav Medan; Eyal Yair; Dan Chazan

Based on a new similarity model for the voice excitation process, a novel pitch determination procedure is derived. The unique features of the proposed algorithm are infinite (super) resolution, better accuracy than the difference limen for F/sub 0/, robustness to noise, reliability, and modest computational complexity. The algorithm is instrumental to speech processing applications which require pitch synchronous spectral analysis. The computational complexity of the proposed algorithm is well within the capacity of modern digital signal processing (DSP) technology and therefore can be implemented in real time. >

international conference on acoustics, speech, and signal processing | 2000

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

Dan Chazan; Ron Hoory; Gilad Cohen; Meir Zibulski

This paper presents a novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values. The reconstruction technique is based on the sinusoidal speech representation. A set of sine-wave frequencies is derived using the pitch frequency and voicing decisions, and synthetic phases are then assigned to each respective sine wave. The sine-wave amplitudes are generated by sampling a linear combination of frequency domain basis functions. The basis function gains are determined such that the mel-frequency binned spectrum of the reconstructed speech is similar to the mel-frequency binned spectrum, obtained from the original MFCC vector by IDCT and antilog operations. Natural sounding, good quality intelligible speech is obtained by this procedure.

international conference on acoustics, speech, and signal processing | 1993

Optimal multi-pitch estimation using the EM algorithm for co-channel speech separation

Dan Chazan; Yoram Stettiner; David Malah

The problem of optimally estimating (in the maximum-likelihood sense) the pitch of each of several speakers talking simultaneously is addressed. This information is needed in systems which perform co-channel speech separation. A multipitch model is proposed which is used in conjunction with an EM (expectation maximization)-based iterative estimation scheme. The pitch period of each speaker is allowed to vary linearly in the analysis interval, thus offering improved cochannel speech separation. The proposed algorithm is shown to outperform standard pitch detection algorithms in detecting the pitch of simultaneous speakers. The proposed multipitch detection algorithm has potential in improving the performance of speaker separation and in interference suppression systems.<<ETX>>

IEEE Transactions on Audio, Speech, and Language Processing | 2007

A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment

Joseph Keshet; Shai Shalev-Shwartz; Yoram Singer; Dan Chazan

We describe and analyze a discriminative algorithm for learning to align an audio signal with a given sequence of events that tag the signal. We demonstrate the applicability of our method for the tasks of speech-to-phoneme alignment (ldquoforced alignmentrdquo) and music-to-score alignment. In the first alignment task, the events that tag the speech signal are phonemes while in the music alignment task, the events are musical notes. Our goal is to learn an alignment function whose input is an audio signal along with its accompanying event sequence and its output is a timing sequence representing the actual start time of each event in the audio signal. Generalizing the notion of separation with a margin used in support vector machines for binary classification, we cast the learning task as the problem of finding a vector in an abstract inner-product space. To do so, we devise a mapping of the input signal and the event sequence along with any possible timing sequence into an abstract vector space. Each possible timing sequence therefore corresponds to an instance vector and the predicted timing sequence is the one whose projection onto the learned prediction vector is maximal. We set the prediction vector to be the solution of a minimization problem with a large set of constraints. Each constraint enforces a gap between the projection of the correct target timing sequence and the projection of an alternative, incorrect, timing sequence onto the vector. Though the number of constraints is very large, we describe a simple iterative algorithm for efficiently learning the vector and analyze the formal properties of the resulting learning algorithm. We report experimental results comparing the proposed algorithm to previous studies on speech-to-phoneme and music-to-score alignment, which use hidden Markov models. The results obtained in our experiments using the discriminative alignment algorithm are comparable to results of state-of-the-art systems.

international conference on acoustics, speech, and signal processing | 2004

The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation

Alexander Sorin; Tenkasi V. Ramabadran; Dan Chazan; Ron Hoory; Michael J. McLaughlin; David Pearce; Fan Cr Wang; Yaxin Zhang

We present work that has been carried out in developing the ETSI extended DSR standards ES 202 211 and ES 202 212 (2003). These standards extend the previous ETSI DSR standards: basic front-end ES 201 108 and advanced (noise robust) front-end ES 202 050 respectively. The extensions enable enhanced tonal language recognition as well as server-side speech reconstruction capability. The paper discusses the client-side estimation of pitch and voicing class parameters whereas a companion paper discusses the server-side speech reconstruction. Experimental results show enhancement of tonal language recognition rates of proprietary recognition engines, when the standard extensions are used.

international conference on acoustics, speech, and signal processing | 2006

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification

Dan Chazan; Ron Hoory; Ariel Sagi; Slava Shechtman; Alexander Sorin; Zhiwei Shuang; Raimo Bakis

This paper describes an efficient sinusoidal modeling framework for high quality wide band (WB) speech synthesis and modification. This technique may serve as a basis for speech compression in the context of small footprint concatenative Text to Speech systems. In addition, it is a useful representation for voice transformation and morphing purposes, e.g., simultaneous pitch modification and spectral envelope warping. The conventional sinusoidal modeling is enhanced with an adaptive frequency dithering mechanism, based on a degree of voicing analysis. Considerable reduction of the amount of model parameters is achieved by high band phase extension. The proposed model is evaluated and compared to the alternative STRAIGHT framework [1]. Being simpler and considerably more efficient than STRAIGHT, it outperforms it in speech quality for both speech reconstruction and transformation.

Journal of Combinatorial Theory | 1968

A note on time-sharing

Dan Chazan; Alan G. Konheim; Benjamin Weiss

Abstract The setting for the problem discussed here is a service facility which is to be “time-shared” by two customers. A precise notion of a processing schedule, which prescribes the times at which the facility is available to each customer, is introduced. Associated with each schedule is the expected total waiting time of the two customers. The schedules which minimize this time are called optimum schedules and are determined here. A number of examples and extensions are given which indicate, the scope of the methods used.

international conference on pattern recognition | 1994

Dynamic time warping with path control and non-local cost

Yoram Stettiner; David Malah; Dan Chazan

Dynamic time warping (DTW) is a dynamic programming technique widely used for solving time-alignment problems. The classical DTW constrains only the first derivative of the warping function, hence allowing no direct control over the warping function curvature. Moreover, it implicitly assumes-inappropriately for some applications-that the noise is white. We propose a multidimensional dynamic-programming technique which can efficiently solve time-warping optimization problems involving colored noise, and allows control over the warping function curvature. The technique is demonstrated for the co-channel speech separation problem. Applications employing DTW can benefit from the new technique, which offers improved accuracy and robustness in the presence of colored noise and competing speech.

international conference on pattern recognition | 1994

A statistical parametric model for recognition and synthesis of handwriting

Orly Stettiner; Dan Chazan

A new parametric model for excellent quality synthesis of English script in online systems is presented, which uses a fixed length letter segmentation. The script vertical and horizontal velocities are modeled as the impulse response of a 2D linear time-varying second-order system. An analysis-by-synthesis approach is proposed, in which the models parameters are estimated by solving a nonlinear optimization problem. In addition, a statistical model for the ABC letters is presented, which is based on the parametric representation. This composite statistical and parametric model may be utilized in applications such as writer imitation, character recognition and writer identification.

Siam Journal on Applied Mathematics | 1976

Errata: On the Optimality of the Exponential Functions for Some Minimax Problems

Shmuel Gal; Dan Chazan

We consider a class of games in which the first player chooses a positive function and the second player chooses a number. We show that under certain conditions, the minimax strategy for the first player is an exponential function. The results obtained are applied to several search games.

Explore More