Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ladan Baghai-Ravary is active.

Publication


Featured researches published by Ladan Baghai-Ravary.


Archive | 2012

Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders

Ladan Baghai-Ravary; Steve W. Beet

Automatic Speech Signal Analysis for Clinical Diagnosis and Assessment of Speech Disorders provides a survey of methods designed to aid clinicians in the diagnosis and monitoring of speech disorders such as dysarthria and dyspraxia, with an emphasis on the signal processing techniques, statistical validity of the results presented in the literature, and the appropriateness of methodsthat do not requirespecialized equipment, rigorously controlled recording procedures or highly skilled personnel to interpret results. Such techniques offer the promise of a simple and cost-effective, yet objective, assessment of a range of medical conditions, which would be of great value to clinicians. The ideal scenario would begin with the collection of examples of the clients speech, either over the phone or using portable recording devices operated by non-specialist nursing staff. The recordings could then be analyzed initially to aid diagnosis of conditions, and subsequently to monitor the clients progress and response to treatment. The automation of this process would allow more frequent and regular assessments to be performed, as well as providing greater objectivity.


Journal of the Acoustical Society of America | 2015

The effects of delayed auditory and visual feedback on speech production

Jennifer Chesters; Ladan Baghai-Ravary; Riikka Möttönen

Monitoring the sensory consequences of articulatory movements supports speaking. For example, delaying auditory feedback of a speakers voice disrupts speech production. Also, there is evidence that this disruption may be decreased by immediate visual feedback, i.e., seeing ones own articulatory movements. It is, however, unknown whether delayed visual feedback affects speech production in fluent speakers. Here, the effects of delayed auditory and visual feedback on speech fluency (i.e., speech rate and errors), vocal control (i.e., intensity and pitch), and speech rhythm were investigated. Participants received delayed (by 200 ms) or immediate auditory feedback, while repeating sentences. Moreover, they received either no visual feedback, immediate visual feedback, or delayed visual feedback (by 200, 400, and 600 ms). Delayed auditory feedback affected fluency, vocal control, and rhythm. Immediate visual feedback had no effect on any of the speech measures when it was combined with delayed auditory feedback. Delayed visual feedback did, however, affect speech fluency when it was combined with delayed auditory feedback. In sum, the findings show that delayed auditory feedback disrupts fluency, vocal control, and rhythm and that delayed visual feedback can strengthen the disruptive effect of delayed auditory feedback on fluency.


international conference on spoken language processing | 1996

Estimating child and adolescent formant frequency values from adult data

P. Martland; Sandra P. Whiteside; Steve W. Beet; Ladan Baghai-Ravary

The paper introduces a model being developed for estimating child and adolescent formant frequency values from adult data. The model approximates adult male and female pharyngeal and oral cavity lengths, and scales these along the corresponding male or female growth curve. The second and third formant frequencies are estimated directly from these scaled vocal tract dimensions. Two methods of establishing the first formant from the scaled data are discussed. Initial results obtained in the scaling of adult data to child values suggest that age, height and gender are all significant when estimating child formant frequency values. Furthermore, averaging of male and female data is found to be inappropriate since the differing growth rates of males and females imply that vocal tract dimensions cannot be linearly related.


international conference on spoken language processing | 1996

Analysis of ten vowel sounds across gender and regional/cultural accent

P. Martland; Sandra P. Whiteside; Steve W. Beet; Ladan Baghai-Ravary

The paper compares ten vowels sounds across gender and accent. Each formant for each vowel was analysed individually across data sets, but no comparison was drawn directly between the formant relationship within each vowel. The objective was to examine the formants individually across gender and accent to establish a method for transforming vowel quality in a rule-based synthesis system and thus increase its range of voices. Further, it was hoped that it would make the comparison of English formant data across differing accents more simple. Thee sets of American English data were utilised in the analysis, and compared against two British English accents-received pronunciation (RP) and a general northern accent (GN). Initial findings suggest that the relative positions of certain vowel formants are particularly static across gender, least variation being found with the second formant frequency. When accent was considered a greater degree of variation occurred, this being predominantly found with mid-open and mid-closed vowel classes.


Archive | 2013

Technology and Implementation

Ladan Baghai-Ravary; Steve W. Beet

The aim of this chapter is to highlight relevant technical factors and limitations affecting collection and interpretation of speech signals. We concentrate on the typical corruption or distortion of the speech signal which is encountered in the real world, and where possible, we include an indication of how important these effects can be. Transmission and encoding of speech signals in mobile phone networks and the internet is almost invariably lossy, and this has an acute effect on the accuracy of speech recognition systems. Published research has also shown a comparable effect on the accuracy of dysphonia/dysarthria detection. The relationship between some specific aspects of the data collection process and the validity of assessments of new techniques, is discussed. The current absence of a realistic database of remotely collected speech samples is highlighted, and adherence to standardised methods and datasets is shown to be crucial to the evaluation of new algorithms. Methods for combining multiple features into a single result are frequently required, and these too are discussed in this chapter.


international conference on acoustics, speech, and signal processing | 2010

Evidence for the strength of the relationship between Automatic Speech Recognition and Phoneme Alignment performance

Ladan Baghai-Ravary

It might be naïvely assumed that the performance of an Automatic Speech Recognition (ASR) system, and that of an Automatic Speech-to-Phoneme Alignment (ASPA) system using the same acoustic-phonetic models, would be closely related. However many researchers believe this relationship to be, at best weak - but this belief has not previously been tested in an objective and quantitative manner. This paper quantifies the strength of the relationship using analysis of data without reference to manually defined alignment labels. By avoiding comparison with a set of reference labels, both the ASR and the ASPA systems can be considered equivalent, removing any bias due to any difference of “opinion” between the human labeller and the automatic system.


international conference on speech and computer | 2017

VoiScan: Telephone Voice Analysis for Health and Biometric Applications

Ladan Baghai-Ravary; Steve W. Beet

The telephone, whether mobile, landline, or VoIP, is probably the most widely used form of long-distance communication. The most common use of voice biometrics is in telephone-based speaker verification, so the ability to operate effectively over the telephone is crucial. Similarly, access to vocal health monitoring, and other voice analysis technology, would benefit enormously if it were available over the telephone, via an automatic system. This paper describes a set of voice analysis algorithms, designed to be robust against the kinds of distortion and signal degradation encountered in modern telephone communication. The basis of the algorithms in traditional analysis is discussed, as are the design choices made in order to ensure robustness. The utility of these algorithms is demonstrated in a number of target domains.


Archive | 2013

Speech Production and Perception

Ladan Baghai-Ravary; Steve W. Beet

Certain specific characteristics of speech are known to be particularly useful in diagnosing speech disorders by acoustic (perceptual) and instrumental methods. The most widely cited of these are described in this chapter, along with some comments as to their suitability for use in automated systems. Some of these features can be characterised by relatively simple signal processing operations, while others would ideally require a realistic model of the higher levels of neurological processing, including cognition. It is observed that even experts who come to the same ultimate decision regarding diagnosis, often differ in their assessment of individual speech characteristics. The difficulties of quantifying prosody and accurately identifying pitch epochs are highlighted, because of their importance in human perception of speech disorders.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

The Inherent Temporal Precision of Phoneme Transitions

Ladan Baghai-Ravary

In natural speech, some phoneme transitions correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the acoustic transition from one phoneme to the next is gradual. In this paper we determine the naturally occurring groups of phonemes (regardless of conventional phonetic categories) which show similar characteristics in such behavior. These data-driven groupings could be used in the design of decision-trees for context-dependent phoneme clustering, as used in large-vocabulary speech recognition and alignment systems, or during the design of speech databases for speech synthesis systems. We use 128 different Hidden Markov Model phoneme alignment systems and a large corpus of British English speech to assess the consistency with which different phoneme transitions can be identified. The phoneme transitions are grouped automatically so as to minimize the statistical differences in behavior between members of each group. In this way we derive two sets of phonemic classes, one for the first phoneme of each phoneme-to-phoneme transition, and another for the second. The grouping of the phonemes confirms that broad phonetic classes are a significant indicator of the accuracy with which boundaries can be identified, but there are a number of exceptions and some apparent sub-divisions and mergers of accepted phonetic classes. The automatic grouping of the second phonemes results in two singletons, /Z/ and /N/ (in SAMPA notation). Finally, statistics are presented which characterize the precision with which transitions between these automatic classes can be identified. These could provide weightings to be applied to different transitions to provide a more realistic assessment when evaluating the relative accuracies of different alignment systems.


IEEE Transactions on Speech and Audio Processing | 1998

Multistep coding of speech parameters for compression

Ladan Baghai-Ravary; Steve W. Beet

This paper presents specific new techniques for coding of speech representations and a new general approach to coding for compression that directly utilizes the multidimensional nature of the input data. Many methods of speech analysis yield a two-dimensional (2-D) pattern, with time as one of the dimensions. Various such speech representations, and power spectrum sequences in particular, are shown here to be amenable to 2-D compression using specific models which take account of a large part of their structure in both dimensions. Newly developed techniques, multistep adaptive flux interpolation (MAFI) and multistep flow-based prediction (MFBP) are presented. These are able to code power spectral density (PSD) sequences of speech more completely and accurately than conventional methods. This is due to their ability to model nonstationary, but piecewise-continuous, signals, of which speech is a good example. Initially, MAFI and MFBP are applied in the time domain, then reapplied to the encoded data in the second dimension. This approach allows the coding algorithm to exploit redundancy in both dimensions, giving a significant improvement in the overall compression ratio. Furthermore, the compression may be reapplied several times. The data is further compressed with each application.

Collaboration


Dive into the Ladan Baghai-Ravary's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

P. Martland

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge