Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark A. Clements is active.

Publication


Featured researches published by Mark A. Clements.


IEEE Transactions on Signal Processing | 1991

Constrained iterative speech enhancement with application to speech recognition

John H. L. Hansen; Mark A. Clements

The basis of an improved form of iterative speech enhancement for single-channel inputs is sequential maximum a posteriori estimation of the speech waveform and its all-pole parameters, followed by imposition of constraints upon the sequence of speech spectra. The approaches impose intraframe and interframe constraints on the input speech signal. Properties of the line spectral pair representation of speech allow for an efficient and direct procedure for application of many of the constraint requirements. Substantial improvement over the unconstrained method is observed in a variety of domains. Informed listener quality evaluation tests and objective speech quality measures demonstrate the techniques effectiveness for additive white Gaussian noise. A consistent terminating point of the iterative technique is shown. The current systems result in substantially improved speech quality and linear predictive coding (LPC) parameter estimation with only a minor increase in computational requirements. The algorithms are evaluated with respect to improving automatic recognition of speech in the presence of additive noise and shown to outperform other enhancement methods in this application. >


IEEE Transactions on Speech and Audio Processing | 1995

The challenge of spoken language systems: Research directions for the nineties

Ron Cole; L. Hirschman; L. Atlas; M. Beckman; Alan W. Biermann; M. Bush; Mark A. Clements; L. Cohen; Oscar N. Garcia; B. Hanson; Hynek Hermansky; S. Levinson; Kathleen R. McKeown; Nelson Morgan; David G. Novick; Mari Ostendorf; Sharon L. Oviatt; Patti Price; Harvey F. Silverman; J. Spiitz; Alex Waibel; Cliff Weinstein; Stephen A. Zahorian; Victor W. Zue

A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the persons words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area. >


IEEE Transactions on Biomedical Engineering | 2008

Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech

Elliot Moore; Mark A. Clements; John W. Peifer; Lydia Weisser

The motivation for this work is in an attempt to rectify the current lack of objective tools for clinical analysis of emotional disorders. This study involves the examination of a large breadth of objectively measurable features for use in discriminating depressed speech. Analysis is based on features related to prosodics, the vocal tract, and parameters extracted directly from the glottal waveform. Discrimination of the depressed speech was based on a feature selection strategy utilizing the following combinations of feature domains: prosodic measures alone, prosodic and vocal tract measures, prosodic and glottal measures, and all three domains. The combination of glottal and prosodic features produced better discrimination overall than the combination of prosodic and vocal tract features. Analysis of discriminating feature sets used in the study reflect a clear indication that glottal descriptors are vital components of vocal affect analysis.


Journal of the Acoustical Society of America | 1995

Analysis of the glottal excitation of emotionally styled and stressed speech

Kathleen E. Cummings; Mark A. Clements

The problems of automatic recognition of and synthesis of multistyle speech have become important topics of research in recent years. This paper reports an extensive investigation of the variations that occur in the glottal excitation of eleven commonly encountered speech styles. Glottal waveforms were extracted from utterances of non-nasalized vowels for two speakers for each of the eleven speaking styles. The extracted waveforms were parametrized into four duration-related and two slope-related values. Using these six parameters, the glottal waveforms from the eleven styles of speech were analyzed both qualitatively and quantitatively. The glottal waveforms from each style speech were analyzed both qualitatively and quantitatively. The glottal waveforms from each style of speech have been shown to be significantly and identifiably different from all other styles, thereby confirming the importance of the glottal waveform in conveying speech style information and in causing speech waveform variations. The degree of variation in styled glottal waveforms has been shown to be consistent when trained on one speaker and compared with another.


Signal Processing | 1988

Analysis and compensation of stressed and noisy speech with application to robust automatic recognition

John H. L. Hansen; Mark A. Clements

This thesis addresses the problem of automatic speech recognition in noisy, stressful environments. The main contributions include a comprehensive and unified investigation which revealed new and statistically reliable acoustic correlates of speech under stress, the formulation of a new class of constrained iterative speech enhancement algorithms, and the achievement of robust automatic speech recognition through the development of speech enhancement and stress compensation programs. The first goal of improving recognition of speech produced under stressful conditions was accomplished through extensive investigations revealing new and statistically reliable acoustic correlates of speech under stress. Analysis was performed on (i) speech with simulated stress, (ii) speech from stress inducing workload tasks or speech in noise, and (iii) speech produced under actual stress or emotional conditions. Characteristics from five speech production domains were addressed (pitch, glottal source, duration, intensity, and vocal-tract shaping). Statistical evaluation ascertained the reliability of variation in average, variability, and distribution of each speech parameter as a stress relayer. A new class of constrained iterative speech enhancement algorithms were formulated for the purposes of improving recognition performance in noisy environments. The new approaches apply inter- and intra-frame spectral constraints in the estimation procedure to ensure optimum speech quality across all speech classes. Constraints are applied based on the presence of perceptually important speech characteristics obtained during the enhancement procedure. The algorithms are preferable to existing techniques in several respects: (i) they result in subtantially improved speech quality and parameter estimation over past techniques for additive white noise distortion, (ii) they have been extended and shown to perform well on non-stationary colored noise, and (iii) they possess a more consistent terminating criterion which was previously unavailable. The final goal of robust recognition in noisy stressful environments was addressed based on formulation of enhancement and stress compensation preprocessors. Enhancement preprocessors were shown to improve recognition performance for neutral speech over past enhancement techniques for all signal-to-noise ratios considered. Stress compensation algorithms are shown to reduce stress effects prior to recognition. Finally, combined speech enhancement stress compensation preprocessing is shown to be extremely effective in reducing and even eliminating effects caused by stress and noise for robust automatic recognition.


computer vision and pattern recognition | 2013

Decoding Children's Social Behavior

James M. Rehg; Gregory D. Abowd; Agata Rozga; Mario Romero; Mark A. Clements; Stan Sclaroff; Irfan A. Essa; Opal Ousley; Yin Li; Chanho Kim; Hrishikesh Rao; Jonathan C. Kim; Liliana Lo Presti; Jianming Zhang; Denis Lantsman; Jonathan Bidwell; Zhefan Ye

We introduce a new problem domain for activity recognition: the analysis of childrens social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.


EURASIP Journal on Advances in Signal Processing | 2002

Automatic speechreading with applications to human-computer interfaces

Xiaozheng Zhang; Charles C. Broun; Russell M. Mersereau; Mark A. Clements

There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.


Medical Engineering & Physics | 2002

Reconstruction of speech from whispers

Robert W. Morris; Mark A. Clements

This paper investigates a method for the real-time reconstruction of normal speech from whispers. This system could be used by aphonic individuals as a voice prosthesis. It could also provide improved verbal communication when normal speech is not appropriate. The normal speech is synthesized using the mixed excitation linear prediction model. Differences between whispered and phonated speech are discussed and methods for estimating the parameters of this model from whispered speech for real-time synthesis are proposed. This includes smoothing the noisy linear prediction spectra, modifying the formants, and synthesizing of the excitation signal. Trade-offs between computational complexity, delay, and accuracy of different methods are discussed.


International Journal of Speech Technology | 2002

Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives

Peter S. Cardillo; Mark A. Clements; Michael S. Miller

A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one popular embodiment of LVCSR is presented along with other operating characteristics of the new technique. The current implementation for Digital Media Asset Management (DMAM) is described along with suggested applications in other domains.


IEEE Transactions on Speech and Audio Processing | 1997

Sinusoidal modeling and modification of unvoiced speech

Michael W. Macon; Mark A. Clements

Although sinusoidal models have been shown to be useful for time-scale and pitch modification of voiced speech, objectionable artifacts often arise when such models are applied to unvoiced speech. This article presents a sinusoidal model-based speech modification algorithm that preserves the natural character of unvoiced speech sounds after pitch and time-scale modification, eliminating commonly encountered artifacts. This advance is accomplished via a perceptually motivated modulation of the sinusoidal component phases that mitigates artifacts in the reconstructed signal after time-scale and pitch modification.

Collaboration


Dive into the Mark A. Clements's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Kathleen E. Cummings

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert W. Morris

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jonathan C. Kim

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David V. Anderson

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Hrishikesh Rao

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Thomas P. Barnwell

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jon A. Arrowood

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kaustubh Kalgaonkar

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge