Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zahi N. Karam is active.

Publication


Featured researches published by Zahi N. Karam.


international conference on acoustics, speech, and signal processing | 2014

Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech

Zahi N. Karam; Emily Mower Provost; Satinder P. Singh; Jennifer Montgomery; Christopher Archer; Gloria J. Harrington; Melvin G. McInnis

Speech patterns are modulated by the emotional and neurophysiological state of the speaker. There exists a growing body of work that computationally examines this modulation in patients suffering from depression, autism, and post-traumatic stress disorder. However, the majority of the work in this area focuses on the analysis of structured speech collected in controlled environments. Here we expand on the existing literature by examining bipolar disorder (BP). BP is characterized by mood transitions, varying from a healthy euthymic state to states characterized by mania or depression. The speech patterns associated with these mood states provide a unique opportunity to study the modulations characteristic of mood variation. We describe methodology to collect unstructured speech continuously and unobtrusively via the recording of day-to-day cellular phone conversations. Our pilot investigation suggests that manic and depressive mood states can be recognized from this speech data, providing new insight into the feasibility of unobtrusive, unstructured, and continuous speech-based wellness monitoring for individuals with BP.


international conference on acoustics, speech, and signal processing | 2008

A multi-class MLLR kernel for SVM speaker recognition

Zahi N. Karam; William M. Campbell

Speaker recognition using support vector machines (SVMs) with features derived from generative models has been shown to perform well. Typically, a universal background model (UBM) is adapted to each utterance yielding a set of features that are used in an SVM. We consider the case where the UBM is a Gaussian mixture model (GMM), and maximum likelihood linear regression (MLLR) adaptation is used to adapt the means of the UBM. Recent work has examined this setup for the case where a global MLLR transform is applied to all the mixture components of the QMM UBM. This work produced positive results that warrant examining this setup with multi-class MLLR adaptation, which groups the UBM mixture components into classes and applies a different transform to each class. This paper extends the MLLR/GMM framework to the multi- class case. Experiments on the NIST SRE 2006 corpus show that multi-class MLLR improves on global MLLR and that the proposed systems performance is comparable with state of the art systems.


international conference on acoustics, speech, and signal processing | 2010

Language recognition using deep-structured conditional random fields

Dong Yu; Shizhen Wang; Zahi N. Karam; Li Deng

We present a novel language identification technique using our recently developed deep-structured conditional random fields (CRFs). The deep-structured CRF is a multi-layer CRF model in which each higher layers input observation sequence consists of the lower layers observation sequence and the resulting lower layers frame-level marginal probabilities. In this paper we extend the original deep-structured CRF by allowing for distinct state representations at different layers and demonstrate its benefits. We propose an unsupervised algorithm to pre-train the intermediate layers by casting it as a multi-objective programming problem that is aimed at minimizing the average frame-level conditional entropy while maximizing the state occupation entropy. Empirical evaluation on a seven-language/dialect voice mail routing task showed that our approach can achieve a routing accuracy (RA) of 86.4% and average equal error rate (EER) of 6.6%. These results are significantly better than the 82.5% RA and 7.5% average EER obtained using the Gaussian mixture model trained with the maximum mutual information criterion but slightly worse than the 87.7% RA and 6.4% EER achieved using the support vector machine with model pushing on the Gaussian super vector (GSV).


international conference on acoustics, speech, and signal processing | 2011

A channel-blind system for speaker verification

Najim Dehak; Zahi N. Karam; Douglas A. Reynolds; Réda Dehak; William M. Campbell; James R. Glass

The majority of speaker verification systems proposed in the NIST speaker recognition evaluation are conditioned on the type of data to be processed: telephone or microphone. In this paper, we propose a new speaker verification system that can be applied to both types of data. This system, named blind system, is based on an extension of the total variability framework. Recognition results with the proposed channel-independent system are comparable to state of the art systems that require conditioning on the channel type. Another advantage of our proposed system is that it allows for combining data from multiple channels in the same visualization in order to explore the effects of different microphones and collection environments.


Schizophrenia Research | 2013

Can P300 distinguish among schizophrenia, schizoaffective and bipolar I disorders? An ERP study of response inhibition.

J. Chun; Zahi N. Karam; F. Marzinzik; Masoud Kamali; Lisa O'Donnell; Ivy F. Tso; Theo C. Manschreck; Melvin G. McInnis; Patricia J. Deldin

Research utilizing visual event-related brain potentials (ERPs) has demonstrated that reduced P300 amplitude and prolonged latency may qualify as a biological marker (biomarker) for schizophrenia (SZ). We examined P300 characteristics in response inhibition among three putatively distinct psychopathology groups including schizophrenia (SZ), bipolar I disorder (BD) and schizoaffective disorder (SA) in comparison with healthy controls (CT) to determine their electrophysiological distinctiveness. In two separate studies, deficits in response inhibition indexed by the P300 component were investigated using a lateralized Go/NoGo task. We hypothesized that deficits in response inhibition would be present and distinctive among the groups. In both studies, SZ showed response inhibition deficits as measured by P300 when stimuli were presented to the right visual field. In Study 2, delayed cognitive stimulus evaluation was observed in BD as indexed by prolonged P300 latency for NoGo trials. Six selected NoGo P300 variables out of thirty six NoGo P300 variables (18 amplitude, 18 latency) correctly classified SZ (79%), SA (64%) in Study 1 and seven variables selected in Study 2 classified CT (80%), and SZ (61%), BD (67%) and CT (68%) with the accuracy higher than chance level (33%). The findings suggest that distinct P300 features in response inhibition may be biomarkers with the capacity to distinguish BD and SZ, although SA was not clearly distinguishable from SZ and CT.


international conference on acoustics, speech, and signal processing | 2011

The MIT LL 2010 speaker recognition evaluation system: Scalable language-independent speaker recognition

Douglas E. Sturim; William M. Campbell; Najim Dehak; Zahi N. Karam; Alan McCree; Douglas A. Reynolds; Fred Richardson; Pedro A. Torres-Carrasquillo; Stephen Shum

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals—language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.


conference of the international speech communication association | 2013

Graph Embedding for Speaker Recognition

Zahi N. Karam; William M. Campbell

This chapter presents applications of graph embedding to the problem of text-independent speaker recognition. Speaker recognition is a general term encompassing multiple applications. At the core is the problem of speaker comparison—given two speech recordings (utterances), produce a score which measures speaker similarity. Using speaker comparison, other applications can be implemented—speaker clustering (grouping similar speakers in a corpus), speaker verification (verifying a claim of identity), speaker identification (identifying a speaker out of a list of potential candidates), and speaker retrieval (finding matches to a query set).


international conference on digital signal processing | 2007

Computation of the One-Dimensional Unwrapped Phase

Zahi N. Karam; Alan V. Oppenheim

In this paper, the computation of the unwrapped phase of the discrete-time Fourier transform (DTFT) of a one-dimensional finite-length signal is explored. The phase of the DTFT is not unique, and may contain integer multiple of 2 pi discontinuities. The unwrapped phase is the instance of the phase function chosen to ensure continuity. This paper compares existing algorithms for computing the unwrapped phase. Then, two composite algorithms are proposed that build upon the existing ones. The core of the proposed methods is based on recent advances in polynomial factoring. The proposed methods are implemented and compared to the existing ones.


ieee signal processing workshop on statistical signal processing | 2011

Graph relational features for speaker recognition and mining

Zahi N. Karam; William M. Campbell; Najim Dehak

Recent advances in the field of speaker recognition have resulted in highly efficient speaker comparison algorithms [1] [2]. The advent of these algorithms allows for leveraging a background set, consisting a large numbers of unlabeled recordings, to improve recognition [3] [4]. In this work, a relational graph, where nodes represent utterances and links represent speaker similarity, is created from the background recordings in which the recordings of interest, train and test, are then embedded. Relational features computed from the embedding are then used to obtain a match score between the recordings of interest. We show the efficacy of these features in speaker verification and speaker mining tasks.


international conference on acoustics, speech, and signal processing | 2011

Towards reduced false-alarms using cohorts

Zahi N. Karam; William M. Campbell; Najim Dehak

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) [1] was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of methods to tackle this issue are presented that require a large and diverse impostor set: the first set penalizes trials whose enrollment and test utterances are not nearest neighbors of each other while the second takes an adaptive score normalization approach similar to TopNorm [2] and ATNorm [3].

Collaboration


Dive into the Zahi N. Karam's collaboration.

Top Co-Authors

Avatar

William M. Campbell

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Najim Dehak

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas A. Reynolds

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Douglas E. Sturim

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fred Richardson

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge