Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Murtaza Bulut is active.

Publication


Featured researches published by Murtaza Bulut.


international conference on multimodal interfaces | 2004

Analysis of emotion recognition using facial expressions, speech and multimodal information

Carlos Busso; Zhigang Deng; Serdar Yildirim; Murtaza Bulut; Chul Min Lee; Abe Kazemzadeh; Sungbok Lee; Ulrich Neumann; Shrikanth Narayanan

The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, and other, modalities to improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also discusses two approaches used to fuse these two modalities: decision level and feature level integration. Using a database recorded from an actress, four emotions were classified: sadness, anger, happiness, and neutral state. By the use of markers on her face, detailed facial motions were captured with motion capture, in conjunction with simultaneous speech recordings. The results reveal that the system based on facial expression gave better performance than the system based on just acoustic information for the emotions considered. Results also show the complementarily of the two modalities and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.


Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. | 2002

Limited domain synthesis of expressive military speech for animated characters

W. L. Johnson; Shrikanth Narayanan; R. Whitney; R. Das; Murtaza Bulut; C. LaBore

Text-to-speech synthesis can play an important role in interactive education and training applications, as voices for animated agents. Such agents need high-quality voices capable of expressing intent and emotion. This paper presents preliminary results in an effort aimed at synthesizing expressive military speech for training applications. Such speech has acoustic and prosodic characteristics that can differ markedly from ordinary conversational speech. A limited domain synthesis approach is used employing samples of expressive speech, classified according to speaking style. The resulting synthesizer was tested both in isolation and in the context of a virtual reality training scenario with animated characters.


Journal of the Acoustical Society of America | 2008

On the robustness of overall F0-only modifications to the perception of emotions in speech

Murtaza Bulut; Shrikanth Narayanan

Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is on how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression.


international conference on acoustics, speech, and signal processing | 2007

A Statistical Approach for Modeling Prosody Features using POS Tags for Emotional Speech Synthesis

Murtaza Bulut; Sungbok Lee; Shrikanth Narayanan

Deriving statistical models for emotional speech processing is a challenging problem because of the highly varying nature of emotion expressions. We address this problem by modeling prosodic parameter differences at the part of speech (POS) level for emotional utterances for the purpose of emotional speech synthesis. Synthesis at the POS level is appealing because POS tags carry salient information conveying speech prominence. Analysis of energy, duration and F0 differences between matching neutral-angry, neutral-sad and neutral-happy emotional utterance pairs shows that Gaussian distributions can be used to model the parameter differences. Pairwise comparisons of POS features reveal that it is more probable that the normalized mean and median energy of sad POS tags are larger than neutral, angry or happy POS tags. They also show that for particular tags it is more likely that angry emotion has higher F0 median than happy emotion, and that sad emotion has higher F0 median than neutral emotion. Experiments of conversion of neutral speech into emotional speech using the Gaussian probability functions provide helpful insights into the application of statistical models in speech synthesis.


international conference on acoustics, speech, and signal processing | 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

Shrikanth Narayanan; Panayiotis G. Georgiou; Abhinav Sethy; Dagen Wang; Murtaza Bulut; Shiva Sundaram; Emil Ettelaie; Sankaranarayanan Ananthakrishnan; Horacio Franco; Kristin Precoda; Dimitra Vergyri; Jing Zheng; Wen Wang; Ramana Rao Gadde; Martin Graciarena; Victor Abrash; Michael W. Frandsen; Colleen Richey

Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements


international conference on acoustics, speech, and signal processing | 2008

Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech

Murtaza Bulut; Sungbok Lee; Shrikanth Narayanan

One of the biggest challenges in emotional speech resynthesis is the selection of modification parameters that will make humans perceive a targeted emotion. The best selection method is by using human raters. However, for large evaluation sets this process can be very costly. In this paper, we describe a recognition for synthesis (RFS) system to automatically select a set of possible parameter values that can be used to resynthesize emotional speech. The system, developed with supervised training, consists of synthesis (TD-PSOLA), recognition (neural network) and parameter selection modules. The experimental results show evidence that the parameter sets selected by the RFS system can be successfully used to resynthesize the input neutral speech as angry speech, demonstrating that the RFS system can assist in the human evaluation of emotional speech.


international conference on connected vehicles and expo | 2013

Camera-based heart rate monitoring in highly dynamic light conditions

Vincent Jeanne; Michel Jozef Agnes Asselman; Bert den Brinker; Murtaza Bulut

Recent advances in biomedical engineering have shown that heart rate can be monitored remotely using regular RGB cameras by analyzing minute skin color changes caused by periodic blood flow. In this paper an infrared-based alternative for light-robust camera-based heart rate measurements suitable for automotive applications is presented. The results obtained by this system show high accuracy (RMSE <; 1BPM under discolight) and a correlation score above 0.99 when compared with a reference measurement method. The proposed system enables new applications in the automotive field, especially since heart rate measurement can be integrated with other camera-based driver monitoring solutions like eye tracking.


Journal of the Acoustical Society of America | 2004

Effects of emotion on different phoneme classes

Chul Min Lee; Serdar Yildirim; Murtaza Bulut; Carlos Busso; Abe Kazemzadeh; Sungbok Lee; Shrikanth Narayanan

This study investigates the effects of emotion on different phoneme classes using short‐term spectral features. In the research on emotion in speech, most studies have focused on prosodic features of speech. In this study, based on the hypothesis that different emotions have varying effects on the properties of the different speech sounds, we investigate the usefulness of phoneme‐class level acoustic modeling for automatic emotion classification. Hidden Markov models (HMM) based on short‐term spectral features for five broad phonetic classes are used for this purpose using data obtained from recordings of two actresses. Each speaker produces 211 sentences with four different emotions (neutral, sad, angry, happy). Using the speech material we trained and compared the performances of two sets of HMM classifiers: a generic set of ‘‘emotional speech’’ HMMs (one for each emotion) and a set of broad phonetic‐class based HMMs (vowel, glide, nasal, stop, fricative) for each emotion type considered. Comparison of ...


ambient intelligence | 2010

Speech Synthesis Systems in Ambient Intelligence Environments

Murtaza Bulut; Shrikanth Narayanan

Publisher Summary This chapter discusses the state of the art in speech synthesis systems and the components necessary to incorporate ambient intelligence characteristics in them. Spoken interaction is probably the most effective means of human communication. Speech is an essential characteristic of humans that sets them apart from other species. It has evolved to become extremely flexible, variable, and consequently very complex. A traditional speech synthesis system consists of four major components—text generator, text processor, speech unit generator, and prosody generator. A speech synthesis system that talks to the user is an example of direct communication, which can take place in many instances and for various purposes, such as alerting, informing, answering, entertaining, and educating. The conditions under which such services are provided can vary. Also, naturally, users can vary significantly based on time, sex, age, education, experience, culture, scientific and emotional intelligence, needs, wealth, and so forth. This chapter summarizes the state of current speech synthesis technology, outlines its essential highlights and limitations, and projects future opportunities in the context of human-centric AmI interfaces. The goal is speech synthesis systems in AmI environments that will be able to produce the right speech at the right place and time and to the right person. This is a highly challenging task that requires multidisciplinary research in several fields.


Journal of the Acoustical Society of America | 2005

Some articulatory details of emotional speech

Sungbok Lee; Serdar Yildirim; Murtaza Bulut; Abe Kazemzadeh; Shrikanth Narayanan

Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root‐mean‐square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. I...

Collaboration


Dive into the Murtaza Bulut's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Sungbok Lee

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Carlos Busso

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Abe Kazemzadeh

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chul Min Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge