Itshak Lapidot
University of Avignon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Itshak Lapidot.
IEEE Transactions on Neural Networks | 2002
Itshak Lapidot; Hugo Guterman; Arnon D. Cohen
We present a method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given. Each speaker was modeled by a self-organizing map (SOM). The SOMs were randomly initiated. An iterative algorithm allows the data move from one model to another and adjust the SOMs. The restriction that the data can move only in small groups but not by moving each and every feature vector separately force the SOMs to adjust to speakers (instead of phonemes or other vocal events). This method was applied to high-quality conversations with two to five participants and to two-speaker telephone-quality conversations. The results for two (both high- and telephone-quality) and three speakers were over 80% correct segmentation. The problem becomes even harder when the number of participants is also unknown. Based on the iterative clustering algorithm a validity criterion was also developed to estimate the number of speakers. In 16 out of 17 conversations of high-quality conversations between two and three participants, the estimation of the number of the participants was correct. In telephone-quality the results were poorer.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Oshry Ben-Harush; Itshak Lapidot; Hugo Guterman
Speaker diarization systems attempt to assign temporal segments from a conversation between R speakers to an appropriate speaker r. This task is generally performed when no prior information is given regarding the speakers. The number of speakers is usually unknown and needs to be estimated. However, there are applications where the number of speakers is known in advance. The diarization process generally consists of change detection, clustering and labeling of a given audio stream. Speaker diarization can be performed using an iterative approach that is optimized by the selection of appropriate initial conditions. This study examines the influence of several common initialization algorithms including two variants of a recently proposed, K-means based initialization algorithm over the performance of an iterative-based speaker diarization system applied to two speaker telephone conversations. The suggested speaker diarization system employs either self organizing maps or Gaussian mixture models in order to model the speakers and non-speech in the conversation. The diarization system and initialization algorithms are tuned using 108 telephone conversations taken from LDC CallHome corpus, this is the development set. The evaluation subset is composed of 2048 telephone conversations extracted from the NIST 2005 Rich Transcription corpus. The results obtained show that by initializing the speaker diarization system using the K-means based algorithms provide a relative improvement of 10.4% for the LDC development set and 12.2% for the NIST evaluation subset when compared to random initialization after 12 iterations which are required for the convergence of the diarization process using random initialization. However, when using the K-means based initialization approach, only five iterations are required for the system to converge. Thus, using the new initialization allows us to improve the performances both in terms of diarization error rate and speed of convergence.
international workshop on machine learning for signal processing | 2009
Oshry Ben-Harush; Hugo Guterman; Itshak Lapidot
Speaker diarization systems attempt to assign temporal speech segments in a conversation to the appropriate speaker, and non-speech segments to non-speech. Speaker diarization systems basically provide an answer to the question “Who spoke when ?”.
Spectroscopy | 2012
Ahmad Salman; Itshak Lapidot; A. Pomerantz; Leah Tsror; Z. Hammody; R. Moreh; Mahmoud Huleihel; S. Mordechai
Fungi are considered as serious pathogens for many plants, potentially causing severe economic damage. Early detection and identification of these pathogens is crucial for their timely control. The methods available for identification of fungi are time consuming and not always very specific. In this study, the potential of FTIR-ATR spectroscopy was examined together with advanced mathematical principle component analysis (PCA) and statistical linear discriminant analysis (LDA) to differentiate among 10 isolates of Fusarium oxysporum. The results are encouraging and indicate that FTIR-ATR can successfully detect different isolates of Fusarium oxysporum. Based on PCA and LDA calculations in the region 850–1775 cm-1 with 16 PCs, the different strains from the same fungal genus could be classified with 75.3% and 69.5% success rates using the “leave one out” method and “20–80% algorithm” respectively.
Odyssey 2016 | 2016
Itay Salmun; Irit Opher; Itshak Lapidot
This paper extends upon a previous work using Mean Shift algorithm to perform speaker clustering on i-vectors generated from short speech segments. In this paper we examine the effectiveness of probabilistic linear discriminant analysis (PLDA) scoring as the metric of the mean shift clustering algorithm in the presence of different number of speakers. Our proposed method, combined with k-nearest neighbors (kNN) for bandwidth estimation, yields better and more robust results in comparison to the cosine similarity with fixed neighborhood bandwidth for clustering segments of large number of speakers. In the case of 30 speakers, we achieved evaluation parameter K of 72.1 with the PLDA-based mean shift algorithm compared to 65.9 with the cosine-based baseline system.
ieee convention of electrical and electronics engineers in israel | 2012
Itshak Lapidot; Jean-François Bonastre
In this work we examine whether linear discriminant analysis (LDA) can improve the diarization performance, when used as an additional phase in a telephone conversation diarization system. We first apply a classical diarization system. Using systems output (to define the classes of interest) an LDA transformation on the mel-cepstrum features is performed. Then, the final diarization process is applied onto the transformed features. A relative improvement of 14.8% was obtained on LDC America CallHome database. The LDA seemed sensible to both segment duration and amount of data available for training, as shown by the results obtained on NIST SRE-05 database where no significative improvement was observed.
iberoamerican congress on pattern recognition | 2015
Moez Ajili; Jean-François Bonastre; Solange Rossato; Juliette Kahn; Itshak Lapidot
In forensic voice comparison, it is strongly recommended to follow the Bayesian paradigm to present a forensic evidence to the court. In this paradigm, the strength of the forensic evidence is summarized by a likelihood ratio (LR). But in the real world, to base only on the LR without looking to its degree of reliability does not allow experts to have a good judgement. This work is mainly motivated by the need to quantify this reliability. In this concept, we think that the presence of speaker specific information and its homogeneity between the two signals to compare should be evaluated. This paper is dedicated to the latter, the homogeneity. We propose an information theory based homogeneity measure which determines whether a voice comparison is feasible or not.
convention of electrical and electronics engineers in israel | 2010
Wafi Abo-Gannemhy; Itshak Lapidot; Hugo Guterman
In this paper we employ backward Viterbi search for speech recognition. Contrary to forward Viterbi search that is performed from the beginning to the end, and where a word depends on the preceding words, backward Viterbi search is performed from the end to the beginning and the current word depends from the following words. As the errors of the forward and the backward searches are not the same, improvement can be achieved by combining the forward and the backward Viterbi search. The fusion is attained by an expert system based on rover algorithm, and using confidence measure for the words and optimal confidence value for null arcs depending on its place in word transition network (WTN). The experimental result of the combined system showed significant improvement over both forward and backward Viterbi decoding system on the Number 95 database.
Archive | 2001
Itshak Lapidot; Hugo Guterman
In this paper a piecewise-dependent-data (PDD) clustering algorithm is presented, and a proof of its convergence to a local minimum is given. A distortion measure-based model represents each cluster. The proposed algorithm is iterative. At the end of each iteration, a competition between the models is performed. Then the data is regrouped between the models. The “movement” of the data between the models and the retraining allows the minimization of the overall system distortion. The Kohonen Self-Organizing Map (SOM) was used as the VQ model for clustering. The clustering algorithm was tested using data generated from four generators of Continuous Density HMM (CDHMM). It was demonstrated that the overall distortion is a decreasing function.
Computer Speech & Language | 2017
Itay Salmun; Ilya Shapiro; Irit Opher; Itshak Lapidot
Use of PLDA as a criterion to choose the best i-vectors for the new mean.Replacing the constant threshold for as the i-vectors neighborhood by the kNN which dramatically increased the stability of the system. This paper extends upon a previous work using Mean Shift algorithm to perform speaker clustering on i-vectors generated from short speech segments. In this paper we examine the effectiveness of probabilistic linear discriminant analysis (PLDA) scoring as the metric of the mean shift clustering algorithm in the presence of different numbers of speakers. Our proposed method, combined with k-nearest neighbors (kNN) for bandwidth estimation, yields better and more robust results in comparison to the cosine similarity with fixed neighborhood bandwidth for clustering segments of large numbers of speakers. In the case of 30 speakers, we achieved significant improvement in cluster and speaker purity with the PLDA-based mean shift algorithm compared to the cosine-based baseline system.