Ran D. Zilca
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ran D. Zilca.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Ran D. Zilca; Brian Kingsbury; Jiri Navratil; Ganesh N. Ramaswamy
The fine spectral structure related to pitch information is conveyed in Mel cepstral features, with variations in pitch causing variations in the features. For speaker recognition systems, this phenomenon, known as “pitch mismatch” between training and testing, can increase error rates. Likewise, pitch-related variability may potentially increase error rates in speech recognition systems for languages such as English in which pitch does not carry phonetic information. In addition, for both speech recognition and speaker recognition systems, the parsing of the raw speech signal into frames is traditionally performed using a constant frame size and a constant frame offset, without aligning the frames to the natural pitch cycles. As a result the power spectral estimation that is done as part of the Mel cepstral computation may include artifacts. Pitch synchronous methods have addressed this problem in the past, at the expense of adding some complexity by using a variable frame size and/or offset. This paper introduces Pseudo Pitch Synchronous (PPS) signal processing procedures that attempt to align each individual frame to its natural cycle and avoid truncation of pitch cycles while still using constant frame size and frame offset, in an effort to address the above problems. Text independent speaker recognition experiments performed on NIST speaker recognition tasks demonstrate a performance improvement when the scores produced by systems using PPS are fused with traditional speaker recognition scores. In addition, a better distribution of errors across trials may be obtained for similar error rates, and some insight regarding of role of the fundamental frequency in speaker recognition is revealed. Speech recognition experiments run on the Aurora-2 noisy digits task also show improved robustness and better accuracy for extremely low signal-to-noise ratio (SNR) data.
international conference on acoustics, speech, and signal processing | 2003
Ganesh N. Ramaswamy; A. Navratil; Upendra V. Chaudhari; Ran D. Zilca
This paper presents an overview of the architecture and algorithms implemented in IBMs text-independent speaker verification system developed for the 2002 NIST speaker recognition evaluation, particularly for the 1-speaker detection task using cellular test data. We describe individual components including a Gaussianization front-end, celluar-codec post-processing, modeling, discriminative optimization and scoring steps. A combination of multiple, data-perturbed systems using a discriminative objective so as to achieve optimum performance for a low false alarm operating region obtained the top performance in the NIST 2002 1-speaker detection task.
international conference on acoustics, speech, and signal processing | 2002
Ran D. Zilca; Upendra V. Chaudhari; Ganesh N. Ramaswamy
This paper provides a description of the properties of the symmetric sphericity measure between two Gaussian classes, and presents experimental results of its use in the context of text independent speaker verification, performed on cellular speech. A novel geometric interpretation of the sphericity measure is presented, emphasizing its robustness to variations in scaling in feature space compared to other distortion measures, along with an associated score normalization procedure. The experimental results clearly indicate the superiority of this method over the prevailing likelihood ratio approach, motivating further use of the sphericity measure in a multi-component modeling scheme. In addition, the direct use of this method with single Gaussian models is computationally extremely simple, and may provide acceptable performance in certain cases.
international conference on acoustics, speech, and signal processing | 2005
James H. Nealand; Jason W. Pelecanos; Ran D. Zilca; Ganesh N. Ramaswamy
The relative importance of the temporal characteristics of speech for text-dependent and text-constrained speaker verification is investigated. A novel scheme is proposed using a common set of Gaussian components to form various HMM and GMM configurations, establishing a systematic transition from text-dependent to text-constrained speaker verification, and resulting in a novel alternative to conventional GMM-UBM training. Experimental results indicate that the intra-word temporal characteristics of speech do not contribute significantly to performance, however the inter-word temporal characteristics can be used during both enrollment and testing to improve verification performance.
Archive | 2004
Upendra V. Chaudhari; Ryan L. Osborn; Jason W. Pelecanos; Ganesh N. Ramaswamy; Ran D. Zilca
Archive | 2002
Ganesh N. Ramaswamy; Ran D. Zilca; Oleg Alecksandrovich
Archive | 2002
Upendra V. Chaudhari; Ganesh N. Ramaswamy; Ran D. Zilca
Archive | 2003
Upendra V. Chaudhari; Jiri Navratil; Ganesh N. Ramaswamy; Ran D. Zilca
Archive | 2008
Jiri Navratil; Ryan L. Osborn; Jason W. Pelecanos; Ganesh N. Ramaswamy; Ran D. Zilca
Archive | 2004
Jiri Navratil; Ganesh N. Ramaswamy; Ran D. Zilca