Ali Khodabakhsh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ali Khodabakhsh is active.

Explore More

Publication

Featured researches published by Ali Khodabakhsh.

international conference on acoustics, speech, and signal processing | 2015

SAS: A speaker verification spoofing database containing diverse attacks

Zhizheng Wu; Ali Khodabakhsh; Junichi Yamagishi; Daisuke Saito; Tomoki Toda; Simon King

This paper presents the first version of a speaker verification spoofing and anti-spoofing database, named SAS corpus. The corpus includes nine spoofing techniques, two of which are speech synthesis, and seven are voice conversion. We design two protocols, one for standard speaker verification evaluation, and the other for producing spoofing materials. Hence, they allow the speech synthesis community to produce spoofing materials incrementally without knowledge of speaker verification spoofing and anti-spoofing. To provide a set of preliminary results, we conducted speaker verification experiments using two state-of-the-art systems. Without any anti-spoofing techniques, the two systems are extremely vulnerable to the spoofing attacks implemented in our SAS corpus.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Anti-spoofing for text-independent speaker verification: an initial database, comparison of countermeasures, and human performance

Zhizheng Wu; Phillip L. De Leon; Ali Khodabakhsh; Simon King; Zhen-Hua Ling; Daisuke Saito; Bryan Stewart; Tomoki Toda; Mirjam Wester; Junichi Yamagishi

In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.

biomedical and health informatics | 2014

Natural language features for detection of Alzheimer's disease in conversational speech

Ali Khodabakhsh; Serhan Kusçuoglu

Automatic monitoring of the patients with Alzheimers disease and diagnosis of the disease in early stages can have a significant impact on the society. Here, we investigate an automatic diagnosis approach through the use of features derived from transcriptions of conversations with the subjects. As opposed to standard tests that are mostly focused on memory recall, spontaneous conversations are carried with the subjects in informal settings. Features extracted from the transcriptions of the conversations could discriminate between healthy people and patients with high reliability. Although the results are preliminary and patients were in later stages of Alzheimers disease, results indicate the potential use of the proposed natural language based features in the early stages of the disease also. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application using automatic speech recognition systems (ASR) which are known to have very high accuracies in recent years. Thus, the investigated features hold the potential to make it low-cost and convenient to diagnose the disease and monitor the diagnosed patients over time.

Methods of Molecular Biology | 2015

Analysis of speech-based measures for detecting and monitoring Alzheimer’s disease

Ali Khodabakhsh

Automatic diagnosis of the Alzheimers disease as well as monitoring of the diagnosed patients can make significant economic impact on societies. We investigated an automatic diagnosis approach through the use of speech based features. As opposed to standard tests, spontaneous conversations are carried and recorded with the subjects. Speech features could discriminate between healthy people and the patients with high reliability. Although the patients were in later stages of Alzheimers disease, results indicate the potential of speech-based automated solutions for Alzheimers disease diagnosis. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application. Thus, the investigated techniques hold the potential to significantly reduce the financial burden on governments and Alzheimers patients.

international symposium on telecommunications | 2012

Persica: A Persian corpus for multi-purpose text mining and natural language processing

Hamid Eghbalzadeh; Behrooz Hosseini; Shahram Khadivi; Ali Khodabakhsh

Lack of multi-application text corpus despite of the surging text data is a serious bottleneck in the text mining and natural language processing especially in Persian language. This paper presents a new corpus for NEWS articles analysis in Persian called Persica. NEWS analysis includes NEWS classification, topic discovery and classification, trend discovery, category classification and many more procedures. Dealing with NEWS has special requirements. First of all it needs a valid and NEWS-content-enriched corpus to perform the experiments. Our Approach is based on a modified category classification and data normalization over Persian NEWS articles which has led to creation of a multipurpose Persian corpus which shows reasonable results in text mining outcomes. In the literature, regarding to our knowledge there are few Persian corpuses but none of them have Persian NEWS time trend characteristics. Empirical results on our benchmark indicate that in addition to reducing the problem dimensions and useless content, Persica keeps admissible validity and reliability in comparison with standard corpuses in the literature.

IEEE Journal of Selected Topics in Signal Processing | 2017

Postprocessing Synthetic Speech With a Complex Cepstrum Vocoder for Spoofing Phase-Based Synthetic Speech Detectors

Osman Büyük; Ali Khodabakhsh; Ranniery Maia

State-of-the-art speaker verification systems are vulnerable to spoofing attacks. To address the issue, high-performance synthetic speech detectors (SSDs) for existing spoofing methods have been proposed. Phase-based SSDs that exploit the fact that most of the parametric speech coders use minimum-phase filters are particularly successful when synthetic speech is generated with a parametric vocoder. Here, we propose a new attack strategy to spoof phase-based SSDs with the objective of increasing the security of voice verification systems by enabling the development of more generalized SSDs. As opposed to other parametric vocoders, the complex cepstrum approach uses mixed-phase filters, which makes it an ideal candidate for spoofing the phase-based SSDs. We propose using a complex cepstrum vocoder as a postprocessor to existing techniques to spoof the speaker verification system as well as the phase-based SSDs. Once synthetic speech is generated with a speech synthesis or a voice conversion technique, for each synthetic speech frame, a natural frame is selected from a training database using a spectral distance measure. Then, complex cepstrum parameters of the natural frame are used for resynthesizing the synthetic frame. In the proposed method, complex cepstrum-based resynthesis is used as a postprocessor. Hence, it can be used in tandem with any synthetic speech generator. Experimental results showed that the approach is successful at spoofing four phase-based SSDs across nine parametric attack algorithms. Moreover, performance at spoofing the speaker verification system did not substantially degrade compared to the case when no postprocessor is employed.

international conference on acoustics, speech, and signal processing | 2016

OCR-aided person annotation and label propagation for speaker modeling in TV shows

Mateusz Budnik; Laurent Besacier; Ali Khodabakhsh

In this paper, we present an approach for minimizing human effort in manual speaker annotation. Label propagation is used at each iteration of an active learning cycle. More precisely, a selection strategy for choosing the most suitable speech track to be labeled is proposed. Four different selection strategies are evaluated and all the tracks in a corresponding cluster are gathered using agglomerative clustering in order to propagate human annotations. To further reduce the manual labor required, an optical character recognition system is used to bootstrap annotations. At each step of the cycle, annotations are used to build speaker models. The quality of the generated speaker models is evaluated at each step using an i-vector based speaker identification system. The presented approach shows promising results on the REPERE corpus with a minimum amount of human effort for annotation.

european signal processing conference | 2016

Spoofing attacks to i-vector based voice verification systems using statistical speech synthesis with additive noise and countermeasure

Mustafa Caner Ozbay; Ali Khodabakhsh; Amir Mohammadi

Even though improvements in the speaker verification (SV) technology with i-vectors increased their real-life deployment, their vulnerability to spoofing attacks is a major concern. Here, we investigated the effectiveness of spoofing attacks with statistical speech synthesis systems using limited amount of adaptation data and additive noise. Experiment results show that effective spoofing is possible using limited adaptation data. Moreover, the attacks get substantially more effective when noise is intentionally added to synthetic speech. Training the SV system with matched noise conditions does not alleviate the problem. We propose a synthetic speech detector (SSD) that uses session differences in i-vectors for counterspoofing. The proposed SSD had less than 0.5% total error rate in most cases for the matched noise conditions. For the mismatched noise conditions, missed detection rate further decreased but total error increased which indicates that some calibration is needed for mismatched noise conditions.

Odyssey 2016 | 2016

Deep complementary features for speaker identification in TV broadcast data

Mateusz Budnik; Ali Khodabakhsh; Laurent Besacier

This work tries to investigate the use of a Convolutional Neu-ral Network approach and its fusion with more traditional systems such as Total Variability Space for speaker identification in TV broadcast data. The former uses spectrograms for training, while the latter is based on MFCC features. The dataset poses several challenges such as significant class imbalance or background noise and music. Even though the performance of the Convolutional Neural Network is lower than the state-of-the-art, it is able to complement it and give better results through fusion. Different fusion techniques are evaluated using both early and late fusion.

Eurasip Journal on Audio, Speech, and Music Processing | 2015