Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Meysam Asgari is active.

Publication


Featured researches published by Meysam Asgari.


Computer Speech & Language | 2015

Fully automated assessment of the severity of Parkinson's disease from speech

Alireza Bayestehtashk; Meysam Asgari; Izhak Shafran; James McNames

For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinsons disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks - the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 minutes, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.


international conference of the ieee engineering in medicine and biology society | 2010

Predicting severity of Parkinson's disease from speech

Meysam Asgari; Izhak Shafran

Parkinsons disease is known to cause mild to profound communication impairments depending on the stage of progression of the disease. There is a growing interest in home-based assessment tools for measuring severity of Parkinsons disease and speech is an appealing source of evidence. This paper reports tasks to elicit a versatile sample of voice production, algorithms to extract useful information from speech and models to predict the severity of the disease. Apart from standard features from time domain (e.g., energy, speaking rate), spectral domain (e.g., pitch, spectral entropy) and cepstral domain (e.g, mel-frequency warped cepstral coefficients), we also estimate harmonic-to-noise ratio, shimmer and jitter using our recently developed algorithms. In a preliminary study, we evaluate the proposed paradigm on data collected through 2 clinics from 82 subjects in 116 assessment sessions. Our results show that the information extracted from speech, elicited through 3 tasks, can predict the severity of the disease to within a mean absolute error of 5.7 with respect to the clinical assessment using the Unified Parkinsons Disease Rating Scale; the range of target motor sub-scale is 0 to 108. Our analysis shows that elicitation of speech through less constrained task provides useful information not captured in widely employed phonation task. While still preliminary, our results demonstrate that the proposed computational approach has promising real-world applications such as in home-based assessment or in telemonitoring of Parkinsons disease.


international workshop on machine learning for signal processing | 2010

Extracting cues from speech for predicting severity of Parkinson'S disease

Meysam Asgari; Izhak Shafran

Speech pathologists often describe voice quality in hypokinetic dysarthria or Parkinsonism as harsh or breathy, which has been largely attributed to incomplete closure of vocal folds. Exploiting its harmonic nature, we separate voiced portion of the speech to obtain an objective estimate of this quality. The utility of the proposed approach was evaluated on predicting 116 clinical ratings of Parkinsons disease on 82 subjects. Our results show that the information extracted from speech, elicited through 3 tasks, can predict the motor subscore (range 0 to 108) of the clinical measure, the Unified Parkinsons Disease Rating Scale, within a mean absolute error of 5.7 and a standard deviation of about 2.0. While still preliminary, our results are significant and demonstrate that the proposed computational approach has promising real-world applications such as in home-based assessment or in telemonitoring of Parkinsons disease.


Current Alzheimer Research | 2015

Social Markers of Mild Cognitive Impairment: Proportion of Word Counts in Free Conversational Speech

Hiroko H. Dodge; Nora Mattek; Mattie Gregor; Molly Bowman; Adriana Seelye; Oscar Ybarra; Meysam Asgari; Jeffrey Kaye

Background: Detecting early signs of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) during the pre-symptomatic phase is becoming increasingly important for cost-effective clinical trials and also for deriving maximum benefit from currently available treatment strategies. However, distinguishing early signs of MCI from normal cognitive aging is difficult. Biomarkers have been extensively examined as early indicators of the pathological process for AD, but assessing these biomarkers is expensive and challenging to apply widely among pre-symptomatic community dwelling older adults. Here we propose assessment of social markers, which could provide an alternative or complementary and ecologically valid strategy for identifying the pre-symptomatic phase leading to MCI and AD. Methods: The data came from a larger randomized controlled clinical trial (RCT), where we examined whether daily conversational interactions using remote video telecommunications software could improve cognitive functions of older adult participants. We assessed the proportion of words generated by participants out of total words produced by both participants and staff interviewers using transcribed conversations during the intervention trial as an indicator of how two people (participants and interviewers) interact with each other in one-on-one conversations. We examined whether the proportion differed between those with intact cognition and MCI, using first, generalized estimating equations with the proportion as outcome, and second, logistic regression models with cognitive status as outcome in order to estimate the area under ROC curve (ROC AUC). Results: Compared to those with normal cognitive function, MCI participants generated a greater proportion of words out of the total number of words during the timed conversation sessions (p=0.01). This difference remained after controlling for participant age, gender, interviewer and time of assessment (p=0.03). The logistic regression models showed the ROC AUC of identifying MCI (vs. normals) was 0.71 (95% Confidence Interval: 0.54 – 0.89) when average proportion of word counts spoken by subjects was included univariately into the model. Conclusion: An ecologically valid social marker such as the proportion of spoken words produced during spontaneous conversations may be sensitive to transitions from normal cognition to MCI.


spoken language technology workshop | 2012

Robust detection of voiced segments in samples of everyday conversations using unsupervised HMMS

Meysam Asgari; Izhak Shafran; Alireza Bayestehtashk

We investigate methods for detecting voiced segments in everyday conversations from ambient recordings. Such recordings contain high diversity of background noise, making it difficult or infeasible to collect representative labelled samples for estimating noise-specific HMM models. The popular utility get-f0 and its derivatives compute normalized cross-correlation for detecting voiced segments, which unfortunately is sensitive to different types of noise. Exploiting the fact that voiced speech is not just periodic but also rich in harmonic, we model voiced segments by adopting harmonic models, which have recently gained considerable attention. In previous work, the parameters of the model were estimated independently for each frame using maximum likelihood criterion. However, since the distribution of harmonic coefficients depend on articulators of speakers, we estimate the model parameters more robustly using a maximum a posteriori criterion. We use the likelihood of voicing, computed from the harmonic model, as an observation probability of an HMM and detect speech using this unsupervised HMM. The one caveat of the harmonic model is that they fail to distinguish speech from other stationary harmonic noise. We rectify this weakness by taking advantage of the non-stationary property of speech. We evaluate our models empirically on a task of detecting speech on a large corpora of everyday speech and demonstrate that these models perform significantly better than standard voice detection algorithm employed in popular tools.


Alzheimer's & Dementia: Translational Research & Clinical Interventions | 2017

Predicting mild cognitive impairment from spontaneous spoken utterances

Meysam Asgari; Jeffrey Kaye; Hiroko H. Dodge

Trials in Alzheimers disease are increasingly focusing on prevention in asymptomatic individuals. We hypothesized that indicators of mild cognitive impairment (MCI) may be present in the content of spoken language in older adults and be useful in distinguishing those with MCI from those who are cognitively intact. To test this hypothesis, we performed linguistic analyses of spoken words in participants with MCI and those with intact cognition participating in a clinical trial.


australasian telecommunication networks and applications conference | 2008

Voice Activity Detection Using Entropy in Spectrum Domain

Meysam Asgari; Abolghasem Sayadian; Mohsen Farhadloo; Elahe abouie Mehrizi

In this paper we develop a voice activity detection algorithm based on entropy estimation of magnitude spectrum. In addition, the likelihood ratio test (LRT) is employed to determine a threshold to separate of speech segments from non-speech segments. The distributions of entropy magnitude of clean speech and noise signal are assumed to be Gaussian. The application of the concept of entropy to the speech detection problem is based on the assumption that the signal spectrum is more organized during speech segments than during noise segments. One of the main advantages of this method is that it is not very sensitive to the changes of noise level. Our simulation results show that the entropy based VAD is high performance in low signal to noise ratio (SNR) conditions (SNR < 0 dB).


international conference on acoustics, speech, and signal processing | 2014

Automatic measurement of affective valence and arousal in speech

Meysam Asgari; Géza Kiss; Jan P. H. van Santen; Izhak Shafran; Xubo Song

Methods are proposed for measuring affective valence and arousal in speech. The methods apply support vector regression to prosodic and text features to predict human valence and arousal ratings of three stimulus types: speech, delexicalized speech, and text transcripts. Text features are extracted from transcripts via a lookup table listing per-word valence and arousal values and computing per-utterance statistics from the per-word values. Prediction of arousal ratings of delexicalized speech and of speech from prosodic features was successful, with accuracy levels not far from limits set by the reliability of the human ratings. Prediction of valence for these stimulus types as well as prediction of both dimensions for text stimuli proved more difficult, even though the corresponding human ratings were as reliable. Text based features did add, however, to the accuracy of prediction of valence for speech stimuli. We conclude that arousal of speech can be measured reliably, but not valence, and that improving the latter requires better lexical features.


international workshop on machine learning for signal processing | 2014

Inferring social contexts from audio recordings using deep neural networks

Meysam Asgari; Izhak Shafran; Alireza Bayestehtashk

In this paper, we investigate the problem of detecting social contexts from the audio recordings of everyday life such as in life-logs. Unlike the standard corpora of telephone speech or broadcast news, these recordings have a wide variety of background noise. By nature, in such applications, it is difficult to collect and label all the representative noise for learning models in a fully supervised manner. The amount of labeled data that can be expected is relatively small compared to the available recordings. This lends itself naturally to unsupervised feature extraction using sparse auto-encoders, followed by supervised learning of a classifier for social contexts. We investigate different strategies for training these models and report results on a real-world application.


international conference on computer modelling and simulation | 2009

A VQ-Based Single-Channel Audio Separation for Music/Speech Mixtures

Meysam Asgari; Mahdi Fallah; Elahe abouie Mehrizi; Ali Mostafavi

In this paper, we address the problem of audio source separation with one single sensor, based on estimation of statistical model of the sources. We improve the-state-of the art Vector Quantization (VQ) by considering apriori histograms of huge training data. This will result in a more accurate codebook for each source in contrast to the commonly used Linde-Buzo-Gray (LBG) algorithm. An optimum estimator is introduced in separation stage based on Discrete Fourier Transform (DFT) amplitudes. Finally, conducting different simulations it is demonstrated that proposed approach efficiently segregated audio mixtures in terms of Signal to Distortion Ratio (SDR) measures as well as Mean Opinion Score (MOS) criterion.

Collaboration


Dive into the Meysam Asgari's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ali Mostafavi

University of Birmingham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge