Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John H. L. Hansen is active.

Publication


Featured researches published by John H. L. Hansen.


IEEE Transactions on Speech and Audio Processing | 2001

Nonlinear feature based classification of speech under stress

Guojun Zhou; John H. L. Hansen; James F. Kaiser

Studies have shown that variability introduced by stress or emotion can severely reduce speech recognition accuracy. Techniques for detecting or assessing the presence of stress could help improve the robustness of speech recognition systems. Although some acoustic variables derived from linear speech production theory have been investigated as indicators of stress, they are not always consistent. Three new features derived from the nonlinear Teager (1980) energy operator (TEO) are investigated for stress classification. It is believed that the TEO based features are better able to reflect the nonlinear airflow structure of speech production under adverse stressful conditions. The features proposed include TEO-decomposed FM variation (TEO-FM-Var), normalized TEO autocorrelation envelope area (TEO-Auto-Env), and critical band based TEO autocorrelation envelope area (TEO-CB-Auto-Env). The proposed features are evaluated for the task of stress classification using simulated and actual stressed speech and it is shown that the TEO-CB-Auto-Env feature outperforms traditional pitch and mel-frequency cepstrum coefficients (MFCC) substantially. Performance for TEO based features are maintained in both text-dependent and text-independent models, while performance of traditional features degrades in text-independent models. Overall neutral versus stress classification rates are also shown to be more consistent across different stress styles.


IEEE Transactions on Signal Processing | 1991

Constrained iterative speech enhancement with application to speech recognition

John H. L. Hansen; Mark A. Clements

The basis of an improved form of iterative speech enhancement for single-channel inputs is sequential maximum a posteriori estimation of the speech waveform and its all-pole parameters, followed by imposition of constraints upon the sequence of speech spectra. The approaches impose intraframe and interframe constraints on the input speech signal. Properties of the line spectral pair representation of speech allow for an efficient and direct procedure for application of many of the constraint requirements. Substantial improvement over the unconstrained method is observed in a variety of domains. Informed listener quality evaluation tests and objective speech quality measures demonstrate the techniques effectiveness for additive white Gaussian noise. A consistent terminating point of the iterative technique is shown. The current systems result in substantially improved speech quality and linear predictive coding (LPC) parameter estimation with only a minor increase in computational requirements. The algorithms are evaluated with respect to improving automatic recognition of speech in the presence of additive noise and shown to outperform other enhancement methods in this application. >


Speech Communication | 1996

Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

John H. L. Hansen

Abstract It is well known that the introduction of acoustic background distortion and the variability resulting from environmentally induced stress causes speech recognition algorithms to fail. In this paper, several causes for recognition performance degradation are explored. It is suggested that recent studies based on a Source Generator Framework can provide a viable foundation in which to establish robust speech recognition techniques. This research encompasses three inter-related issues: (i) analysis and modeling of speech characteristics brought on by workload task stress, speaker emotion/stress or speech produced in noise (Lombard effect), (ii) adaptive signal processing methods tailored to speech enhancement and stress equalization, and (iii) formulation of new recognition algorithms which are robust in adverse environments. An overview of a statistical analysis of a Speech Under Simulated and Actual Stress (SUSAS) database is presented. This study was conducted on over 200 parameters in the domains of pitch, duration, intensity, glottal source and vocal tract spectral variations. These studies motivate the development of a speech modeling approach entitled Source Generator Framework in which to represent the dynamics of speech under stress. This framework provides an attractive means for performing feature equalization of speech under stress. In the second half of this paper, three novel approaches for signal enhancement and stress equalization are considered to address the issue of recognition under noisy stressful conditions. The first method employs (Auto:I,LSP:T) constrained iterative speech enhancement to address background noise and maximum likelihood stress equalization across formant location and bandwidth. The second method uses a feature enhancing artificial neural network which transforms the input stressed speech feature set during parameterization for keyword recognition. The final method employs morphological constrained feature enhancement to address noise and an adaptive Mel-cepstral compensation algorithm to equalize the impact of stress. Recognition performance is demonstrated for speech under a range of stress conditions, signal-to-noise ratios and background noise types.


IEEE Transactions on Speech and Audio Processing | 2000

A comparative study of traditional and newly proposed features for recognition of speech under stress

Sahar E. Bou-Ghazale; John H. L. Hansen

It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard (1911) effect. This study evaluates the effectiveness of traditional features in recognition of speech under stress and formulates new features which are shown to improve stressed speech recognition. The focus is on formulating robust features which are less dependent on the speaking conditions rather than applying compensation or adaptation techniques. The stressed speaking styles considered are simulated angry and loud. Lombard effect speech, and noisy actual stressed speech from the SUSAS database which is available on a CD-ROM through the NATO IST/TG-01 research group and LDC. In addition, this study investigates the immunity of the linear prediction power spectrum and fast Fourier transform power spectrum to the presence of stress. Our results show that unlike fast Fourier transforms (FFT) immunity to noise, the linear prediction power spectrum is more immune than FFT to stress as well as to a combination of a noisy and stressful environment. Finally, the effect of various parameter processing such as fixed versus variable preemphasis, liftering, and fixed versus cepstral mean normalization are studied. Two alternative frequency partitioning methods are proposed and compared with traditional mel-frequency cepstral coefficients (MFCC) features for stressed speech recognition. It is shown that the alternate filterbank frequency partitions are more effective for recognition of speech under both simulated and actual stressed conditions.


Speech Communication | 1996

Language accent classification in American English

Levent M. Arslan; John H. L. Hansen

Abstract It is well known that speaker variability caused by accent is one factor that degrades performance of speech recognition algorithms. If knowledge of speaker accent can be estimated accurately, then a modified set of recognition models which addresses speaker accent could be employed to increase recognition accuracy. In this study, the problem of language accent classification in American English is considered. A database of foreign language accent is established that consists of words and phrases that are known to be sensitive to accent. Next, isolated word and phoneme based accent classification algorithms are developed. The feature set under consideration includes Mel-cepstrum coefficients and energy, and their first order differences. It is shown that as test utterance length increases, higher classification accuracy is achieved. Isolated word strings of 7–8 words uttered by the speaker results in an accent classification rate of 93% among four different language accents. A subjective listening test is also conducted in order to compare human performance with computer algorithm performance in accent discrimination. The results show that computer based accent classification consistently achieves superior performance over human listener responses for classification. It is shown, however, that some listeners are able to match algorithm performance for accent detection. Finally, an experimental study is performed to investigate the influence of foreign accent on speech recognition algorithms. It is shown that training separate models for each accent rather than using a single model for each word can improve recognition accuracy dramatically.


international conference on acoustics, speech, and signal processing | 1995

Foreign accent classification using source generator based prosodic features

John H. L. Hansen; Levent M. Arslan

Speaker accent is an important issue in the formulation of robust speaker independent recognition systems. Knowledge gained from a reliable accent classification approach could improve overall recognition performance. In this paper, a new algorithm is proposed for foreign accent classification of American English. A series of experimental studies are considered which focus on establishing how speech production is varied to convey accent. The proposed method uses a source generator framework, recently proposed for analysis and recognition of speech under stress [5]. An accent sensitive database is established using speakers of American English with foreign language accents. An initial version of the classification algorithm classified speaker accent from among four different accents with an accuracy of 81.5% in the case of unknown text, and 88.9% assuming known text. Finally, it is shown that as ascent sensitive word count increases, the ability to correctly classify accent also increases, achieving an overall classification rate of 92% among four accent classes.


IEEE Transactions on Speech and Audio Processing | 1994

Morphological constrained feature enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect

John H. L. Hansen

The use of present-day speech recognition techniques in many practical applications has demonstrated the need for improved algorithm formulation under varying acoustical environments. This paper describes a low-vocabulary speech recognition algorithm that provides robust performance in noisy environments with particular emphasis on characteristics due to the Lombard effect. A neutral and stressed-based source generator framework is established to achieve improved speech parameter characterization using a morphological constrained enhancement algorithm and stressed source compensation, which is unique for each source generator across a stressed speaking class. The algorithm uses a noise-adaptive boundary detector to obtain a sequence of source generator classes, which is used to direct noise parameter enhancement and stress compensation. This allows the parameter enhancement and stress compensation schemes to adapt to changing speech generator types. A phonetic consistency rule is also employed based on input source generator partitioning. Algorithm performance evaluation is demonstrated for noise-free and nine noisy Lombard speech conditions that include additive white Gaussian noise, slowly varying computer fan noise, and aircraft cockpit noise. System performance is compared with a traditional discrete-observation recognizer with no embellishments. Recognition rates are shown to increase from an average 36.7% for a baseline recognizer to 74.7% for the new algorithm (a 38% improvement). The new algorithm is also shown to be more consistent, as demonstrated by a decrease in standard deviation of recognition from 21.1 to 11.9 and a reduction in confusable word-pairs under noisy, Lombard-effect stressed speaking conditions. >


IEEE Transactions on Speech and Audio Processing | 2001

Speech enhancement using a constrained iterative sinusoidal model

Jesper Jensen; John H. L. Hansen

This paper presents a sinusoidal model based algorithm for enhancement of speech degraded by additive broad-band noise. In order to ensure speech-like characteristics observed in clean speech, smoothness constraints are imposed on the model parameters using a spectral envelope surface (SES) smoothing procedure. Algorithm evaluation is performed using speech signals degraded by additive white Gaussian noise. Distortion as measured by objective speech quality scores showed a 34%-41% reduction over a SNR range of 5-to-20 dB. Objective and subjective evaluations also show considerable improvement over traditional spectral subtraction and Wiener filtering based schemes. Finally, in a subjective AB preference test, where enhanced signals were coded with the G729 codec, the proposed scheme was preferred over the traditional enhancement schemes tested for SNRs in the range of 5 to 20 dB.


IEEE Transactions on Speech and Audio Processing | 2005

SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

John H. L. Hansen; Rongqing Huang; Bowen Zhou; Michael Seadle; John R. Deller; Aparna Gurijala; Mikko Kurimo; Pongtep Angkititrakul

In this study, we discuss a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). NGSW is the first largescale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20 th Century. We propose a system diagram and discuss critical tasks associated with effective audio information retrieval that include: advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and natural language processing for text query requests. A number of questions regarding copyright assessment, metadata construction, digital watermarking must also be addressed for a sustainable audio collection of this magnitude. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from a portion of the NGSW corpus. We discuss a number of research challenges to address the overall task of robust phrase searching in unrestricted audio corpora. 1. Overview The problem of reliable speech recognition for spoken


IEEE Signal Processing Magazine | 2015

Speaker Recognition by Machines and Humans: A tutorial review

John H. L. Hansen; Taufiq Hasan

Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.

Collaboration


Dive into the John H. L. Hansen's collaboration.

Top Co-Authors

Avatar

Abhijeet Sangwan

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Seyed Omid Sadjadi

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Wooil Kim

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Hynek Boril

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bryan L. Pellom

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Gang Liu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Taufiq Hasan

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chengzhu Yu

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge