Hossein Zeinali
Sharif University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hossein Zeinali.
conference of the international speech communication association | 2016
Hossein Zeinali; Hossein Sameti; Lukas Burget; Jan Cernocký; Nooshin Maghsoodi; Pavel Matejka
Recently, a new data collection was initiated within the RedDots project in order to evaluate text-dependent and text-prompted speaker recognition technology on data from a wider speaker population and with more realistic noise, channel and phonetic variability. This paper analyses our systems built for RedDots challenge – the effort to collect and compare the initial results on this new evaluation data set obtained at different sites. We use our recently introduced HMM based i-vector approach, where, instead of the traditional GMM, a set of phone specific HMMs is used to collect the sufficient statistics for i-vector extraction. Our systems are trained in a completely phraseindependent way on the data from RSR2015 and Libri speech databases. We compare systems making use of standard cepstral features and their combination with neural network based bottle-neck features. The best results are obtained with a scorelevel fusion of such systems.
IEEE Transactions on Audio, Speech, and Language Processing | 2017
Hossein Zeinali; Hossein Sameti; Lukas Burget
The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work, we propose a straightforward hidden Markov model (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simple MFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.
Odyssey 2016 | 2016
Hossein Zeinali; Lukas Burget; Hossein Sameti; Ondrej Glembek; Oldrich Plchot
Techniques making use of Deep Neural Networks (DNN) have recently been seen to bring large improvements in textindependent speaker recognition. In this paper, we verify that the DNN based methods result in excellent performances in the context of text-dependent speaker verification as well. We build our system on the previously introduced HMM based ivector approach, where phone models are used to obtain frame level alignment in order to collect sufficient statistics for ivector extraction. For comparison, we experiment with an alternative alignment obtained directly from the output of DNN trained for phone classification. We also experiment with DNN based bottleneck features and their combinations with standard cepstral features. Although the i-vector approach is generally considered not suitable for text-dependent speaker verification, we show that our HMM based approach combined with bottleneck features provides truly state-of-the-art performance on RSR2015 data.
international conference on acoustics, speech, and signal processing | 2015
Hossein Zeinali; Elaheh Kalantari; Hossein Sameti; Hossein Hadian
I-vectors have proved to be the most effective features for text-independent speaker verification in recent researches. In this article a new scheme is proposed to utilize i-vectors in text-prompted speaker verification in a simple while effective manner. In order to examine this scheme empirically, a telephony dataset of Persian month names is introduced. Experiments show that the proposed scheme reduces the EER by 31% compared to the state-of-the-art State-GMM-MAP method. Furthermore it is shown that using HMM instead of GMM for universal background modeling leads to 15% reduction in EER.
information sciences, signal processing and their applications | 2012
Hossein Zeinali; Hossein Sameti; Hossein Khaki; Bagher BabaAli
In large population Speaker Identification (SI), computation time has become one of the most important issues in recent real time systems. Test computation time depends on the cost of likelihood computation between test features and registered speaker models. For real time application of speaker identification, system must identify an unknown speaker quickly. Hence the conventional SI methods cannot be used. In this paper, we propose a two-step method that utilizes two different identification methods. In the first step we use Nearest Neighbor method to decrease the search space. In the second step we use GMM-based SI methods to specify the target speaker. We achieved 3.5× speed-ups without any loss of accuracy using the proposed method. If the number of best speaker is reduced, the Identification accuracy decreases. So, there is a trade-off between accuracy and speed-up.
IET Biometrics | 2017
Hossein Zeinali; Bagher BabaAli; Hossein Hadian
Signature verification (SV) is one of the common methods for identity verification in banking, where for security reasons, it is very important to have an accurate method for automatic SV (ASV). ASV is usually addressed by comparing the test signature with the enrolment signature(s) signed by the individual whose identity is claimed in two manners: online and offline. In this study, a new method based on the i-vector is proposed for online SV. In the proposed method, a fixed-length vector, called i-vector, is extracted from each signature and then this vector is used for template making. Several techniques such as nuisance attribute projection (NAP) and within-class covariance normalisation (WCCN) are also investigated in order to reduce the intra-class variation in the i-vector space. In the scoring and decision making stage, they also propose to apply a 2-class support vector machine method. Experimental results show the proposed method could achieve 8.75% equal error rate (EER) on SigWiComp2013 database in the best case. On SVC2004 database, it also achieved 5% EER that means 11% relative improvement compared with the best reported result. In addition to its considerable accuracy gain, it has shown significant improvement in the computational cost over conventional dynamic time warping method.
Computers & Electrical Engineering | 2015
Hossein Zeinali; Alireza Mirian; Hossein Sameti; Bagher BabaAli
Cosine similarity and Probabilistic Linear Discriminant Analysis (PLDA) in i-vector space are two state-of-the-art scoring methods in speaker verification field. While PLDA usually gives better accuracy, Cosine Similarity Scoring (CSS) remains a widely used method due to simplicity and acceptable performance. In this domain, several channel compensation and score normalization methods have been proposed to improve the performance. We investigate non-speaker information in cosine similarity metric and propose a new approach to remove it from the decision making process. I-vectors hold a large amount of non-speaker information such as channel effects, language, and phonetic content. This type of information increases the verification error rate and hence it should be removed from the scoring method. To this end we propose a method that estimates non-speaker information between two i-vectors using the development set and subtracts it from cosine similarity. The results indicate that the proposed method performed better than other implemented methods based on the cosine similarity. Furthermore, in certain cases the performance of this method was better than the PLDA method and when combined with PLDA performance was improved in most cases.
international conference on signal processing | 2012
Hossein Zeinali; Hossein Sameti; Bagher BabaAli
Computer Speech & Language | 2017
Hossein Zeinali; Hossein Sameti; Lukas Burget; Jan Cernocký
arxiv:eess.AS | 2018
Hossein Zeinali; Lukas Burget; Jan Cernocky