Nabil N. Bitar
Boston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nabil N. Bitar.
international conference on acoustics speech and signal processing | 1996
Nabil N. Bitar; Carol Y. Espy-Wilson
This paper presents acoustic parameters (APs) that were motivated by phonetic feature theory and employed as a signal representation of speech in a hidden Markov model (HMM) recognition framework. Presently, the phonetic features considered are the manner features: sonorant, syllabic, nonsyllabic, noncontinuant and fricated. The objective of the parameters is to directly target the linguistic information in the signal and to reduce the speaker-dependent information that may yield large speech variability. To achieve these goals, the APs were defined in a relational manner across time or frequency. For evaluation, broad-class recognition experiments were conducted comparing the APs to cepstral-based parameters. The results of the experiments indicate that the APs are able to capture the phonetically relevant information in the speech signal and that, in comparison to the cepstral-based parameters, they are more able to reduce the interspeaker variability.
Journal of the Acoustical Society of America | 1993
Nabil N. Bitar; Armen Y. Balian; Vinay Chandra
An automatic method was developed to classify fricatives as strident/nonstrident. The method is based on relative energy measures in the frequency bands (2–4 kHz, 4–6 kHz, and 6–8 kHz). Preliminary results based on 1176 fricative sounds extracted from sentences in the TIMIT database showed that most of the fricatives /s ž s z/ were classified as strident and most of the fricative /f v θ (hooked schwa) were classified as nonstrident. An analysis of the remaining fricatives showed that their acoustic realizations differed substantially from their canonical form. The typically strident fricatives did appear to be ‘‘weak’’ and the typically nonstrident fricatives did appear to be ‘‘strong.’’ Factors influencing the change in the manifestation of these fricatives are phonetic context and stress. A detailed analysis of the observed variability will be presented.
Journal of the Acoustical Society of America | 1996
Nabil N. Bitar; Carol Y. Espy-Wilson
In developing acoustic parameters based on phonetic features for speaker‐independent speech recognition, it is important that the parameters (1) be based on relative measures as opposed to absolute ones to minimize speaker‐dependent effects and (2) be selected from a pool of candidates according to some ‘‘goodness’’ criteria. In this study, place‐of‐articulation phonetic features are targeted to help classify obstruent sounds. It will be shown how acoustic phonetic knowledge and statistical analysis were combined to go from qualitative definitions of the acoustic correlates for phonetic features to computational algorithms for extraction of the relevant acoustic properties using discriminant analysis and classification trees. Further, it will be shown how using relative measures reduces speaker‐dependent effects, specifically gender, on the acoustic parameters while honing on the phonetic information contained in the speech signal. [Work supported by NSF Research Grant No. IRI‐9310518.]
Journal of the Acoustical Society of America | 1995
Demetrios E. Paneras; Nabil N. Bitar; Carol Y. Espy-Wilson
In an effort to understand variability occurring in fairly casual speech, two experiments were conducted using the TIMIT test data which consist of 1680 sentences spoken by 112 males and 56 females. In the first experiment, the TIMIT phonetic transcriptions were compared automatically against the phonemic transcriptions provided in an on‐line dictionary. In the second experiment, the TIMIT phonetic transcriptions were mapped to the broad classes vowel, sonorant consonant, stop, fricative, and affricate, and compared to the output of an automatic broad classifier. For the types of variability found, the analysis includes: (1) the contexts in which they occur, (2) the frequency at which they occur, and (3) differences in their manifestations as a function of dialect region. Phonological rules generated from this study are discussed in the context of those cited in the literature.
Journal of the Acoustical Society of America | 1994
Nabil N. Bitar; Ramamurthy Mani; Carol Y. Espy-Wilson; S. Hamid Nawab
In this study, the feasibility of a knowledge‐based approach to speaker‐independent speech recognition in the presence of impulsive environmental sounds such as knocks, clinks, and claps is examined. Statistical approaches to speech recognition have had some success in dealing with steady background, probably because they have concentrated on routinely encountered steady background sounds, most of which can be modeled as white or colored noise. However, current statistical approaches are less suited to dealing with environments containing sporadic occurrences of various discrete‐event sounds because of (1) the enormous variety of discrete‐event sounds and (2) discrete‐event sounds can be mixed with the speech signal with different loudness and temporal alignments. In this study, experiments are being performed on feature‐based speech recognition using speech sounds (from a database of spoken telephone numbers) mixed with impulsive sounds (from a database of everyday environmental impulsive sounds). An imp...
Journal of the Acoustical Society of America | 1994
Nabil N. Bitar
As part of a feature‐based speech recognition system, a broad classifier that automatically labels the speech signal in terms of one of the six categories silence, stop, fricative, vowel, semivowel, and nasal was developed. The classifier is based on the detection of acoustic correlates for the linguistic features consonantal, continuant, nonsyllabic, and sonorant. Acoustic properties for the features are detected on the basis of events such as minima, maxima, or changes from a low to high value (or vice versa) in some defined signal parameters (e.g., low‐frequency energy). These landmarks may point up particular instants in time, or they may define regions within the waveform. Fuzzy logic is used to represent the fact that features are manifest in the signal with varying degrees of strength. Preliminary results with the TIMIT database are promising and our analysis shows that phenomena such as coarticulation and phonetic lenition are better handled if all of the acoustic properties are extracted in paral...
Journal of the Acoustical Society of America | 1993
Armen Y. Balian; Nabil N. Bitar
A difficult problem in automated time alignment is the separation of neighboring sounds with the same manner of articulation, such as two adjacent vowels or two adjacent fricatives. Usually, a broad classifier will find only one region that contains both sounds. In this study, the automatic detection of acoustic events that separate such sequences is investigated. For example, an algorithm based on abrupt changes in the energy between 2000 and 4000 Hz separates /v/’s and /s/’s, which are adjacent to /(hooked schwa)/’s or /s/’s Preliminary results based on 38 cases of adjacent fricatives taken from the TIMIT database show that 68% of the automatically placed boundaries agreed with the labeled boundaries. Of the remaining 12 cases, 67% were considered to be as good as, or even better than the TIMIT labeling. The fricative separation algorithm is presently being refined and tested across a larger database containing other similar sounds. Methods for separating adjacent stops and adjacent vowels will also be ...
international conference on acoustics, speech, and signal processing | 1992
Nabil N. Bitar; S.H. Nawab; E. Dorken; D.E. Paneras
The authors describe and demonstrate how the combined use of Wigner and short-time Fourier transform (STFT) processing can be used to advantage in a sound understanding application. Specifically, the Wigner processing of a signal is used for alerting the signal understanding system to the possibility of inadequate time or frequency resolution in the STFT processing of the same signal. The signal understanding system may then reprocess the signal with an STFT with a different window length. The knowledge-based control required for carrying out such signal processing is available in a sound understanding system in the context of which this research was carried out.<<ETX>>
Archive | 1998
Carol Y. Espy-Wilson; Nabil N. Bitar
Archive | 1995
Nabil N. Bitar; Carol Y. Espy-Wilson