Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Murat Akbacak is active.

Publication


Featured researches published by Murat Akbacak.


international conference on acoustics, speech, and signal processing | 2003

Environmental sniffing: noise knowledge estimation for robust speech systems

Murat Akbacak; John H. L. Hansen

Automatic speech recognition systems work reasonably well under clean conditions but become fragile in practical applications involving real-world environments. To date, most approaches dealing with environmental noise in speech systems are based on assumptions concerning the noise, or differences in collecting and training on a specific noise condition, rather than exploring the nature of the noise. As such, speech recognition, speaker ID, or coding systems are typically retrained when new acoustic conditions are to be encountered. In this paper, we propose a new framework entitled Environmental Sniffing to detect, classify, and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental changes. The second goal is to organize this knowledge in an effective manner to allow smart decisions to direct subsequent speech processing systems. Our current framework uses a number of speech processing modules including a hybrid algorithm with T2-BIC segmentation, Gaussian mixture model/hidden Markov model (GMM/HMM)-based classification and noise language modeling to achieve effective noise knowledge estimation. We define a new information criterion that incorporates the impact of noise into Environmental Sniffing performance. We use an in-vehicle speech and noise environment as a test platform for our evaluations and investigate the integration of Environmental Sniffing for automatic speech recognition (ASR) in this environment. Noise sniffing experiments show that our proposed hybrid algorithm achieves a classification error rate of 25.51%, outperforming our baseline system by 7.08%. The sniffing framework is compared to a ROVER solution for automatic speech recognition (ASR) using different noise conditioned recognizers in terms of word error rate (WER) and CPU usage. Results show that the model matching scheme using the knowledge extracted from the audio stream by Environmental Sniffing achieves better performance than a ROVER solution both in accuracy and computation. A relative 11.1% WER improvement is achieved with a relative 75% reduction in CPU resourcesWe propose a framework for extracting knowledge about environmental noise from an input audio sequence and organizing this knowledge for use by other speech systems. To date, most approaches dealing with environmental noise in speech systems are based on assumptions about the noise, or differences in the collection of and training on a specific noise condition, rather than exploring the nature of the noise. We are interested in constructing a new speech framework, entitled environmental sniffing, to detect, classify and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental changes. The second goal is to organize this knowledge in an effective manner to allow smart decisions to direct other speech systems. Our current framework uses a number of speech processing modules including the Teager energy operator (TEO) and a hybrid algorithm with T/sup 2/-BIC segmentation, noise language modeling and GMM classification in noise knowledge estimation. We define a new information criterion that incorporates the impact of noise on environmental sniffing performance. We use an in-vehicle speech and noise environment as a test platform for our evaluations and investigate the integration of environmental sniffing into an automatic speech recognition (ASR) engine in this environment. Noise classification experiments show that the hybrid algorithm achieves an error rate of 25.51%, outperforming a baseline system by an absolute 7.08%.


international conference on acoustics, speech, and signal processing | 2008

Open-vocabulary spoken term detection using graphone-based hybrid recognition systems

Murat Akbacak; Dimitra Vergyri; Andreas Stolcke

We address the problem of retrieving out-of-vocabulary (OOV) words/queries from audio archives for spoken term detection (STD) task. Many STD systems use the output of an automatic speech recognition (ASR) system which has a limited and fixed vocabulary, and are not capable of detecting rare words of high information content, such as named entities. Since such words are often of great interest for a retrieval task it is important to index spoken archives in a way that allows a user to search an OOV query/term.1 In this work, we employ hybrid recognition systems which contain both words and subword units (graphones) to generate hybrid lattice indexes. We use a word-based STD system as our baseline, and present improvements by employing our proposed hybrid STD system that uses words plus graphones on the English broadcast news genre of the 2006 NIST STD task.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems

Murat Akbacak; John H. L. Hansen

Automatic speech recognition systems work reasonably well under clean conditions but become fragile in practical applications involving real-world environments. To date, most approaches dealing with environmental noise in speech systems are based on assumptions concerning the noise, or differences in collecting and training on a specific noise condition, rather than exploring the nature of the noise. As such, speech recognition, speaker ID, or coding systems are typically retrained when new acoustic conditions are to be encountered. In this paper, we propose a new framework entitled Environmental Sniffing to detect, classify, and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental changes. The second goal is to organize this knowledge in an effective manner to allow smart decisions to direct subsequent speech processing systems. Our current framework uses a number of speech processing modules including a hybrid algorithm with T2-BIC segmentation, Gaussian mixture model/hidden Markov model (GMM/HMM)-based classification and noise language modeling to achieve effective noise knowledge estimation. We define a new information criterion that incorporates the impact of noise into Environmental Sniffing performance. We use an in-vehicle speech and noise environment as a test platform for our evaluations and investigate the integration of Environmental Sniffing for automatic speech recognition (ASR) in this environment. Noise sniffing experiments show that our proposed hybrid algorithm achieves a classification error rate of 25.51%, outperforming our baseline system by 7.08%. The sniffing framework is compared to a ROVER solution for automatic speech recognition (ASR) using different noise conditioned recognizers in terms of word error rate (WER) and CPU usage. Results show that the model matching scheme using the knowledge extracted from the audio stream by Environmental Sniffing achieves better performance than a ROVER solution both in accuracy and computation. A relative 11.1% WER improvement is achieved with a relative 75% reduction in CPU resources


Archive | 2005

CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation

John H. L. Hansen; Xianxian Zhang; Murat Akbacak; Umit H. Yapanel; Bryan L. Pellom; Wayne H. Ward; Pongtep Angkititrakul

In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.


international conference on acoustics, speech, and signal processing | 2007

Language Normalization for Bilingual Speaker Recognition Systems

Murat Akbacak; John H. L. Hansen

In this study, we focus on the problem of removing/normalizing the impact of spoken language variation in bilingual speaker recognition (BSR) systems. In addition to environment, recording, and channel mismatches, spoken language mismatch is an additional factor resulting in performance degradation in speaker recognition systems. In todays world, the number of bilingual speakers is increasing with English becoming the universal second language. Data sparseness is becoming an important research issue to deploy speaker recognition systems with limited resources (e.g., short train/test durations). Therefore, leveraging existing resources from different languages becomes a practical concern in limited-resource BSR applications, and effective language normalization schemes are required to achieve more robust speaker recognition systems. Here, we propose two novel algorithms to address the spoken language mismatch problem: normalization at the utterance-level via language identification (LID), and normalization at the segment-level via multilingual phone recognition (PR). We evaluated our algorithms using a bilingual (Spanish-English) speaker set of 80 speakers. Experimental results show improvements over a baseline system which employs fusion of language-dependent speaker models with fixed weights.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Spoken Proper Name Retrieval for Limited Resource Languages Using Multilingual Hybrid Representations

Murat Akbacak; John H. L. Hansen

Research in multilingual speech recognition has shown that current speech recognition technology generalizes across different languages, and that similar modeling assumptions hold, provided that linguistic knowledge (e.g., phone inventory, pronunciation dictionary, etc.) and transcribed speech data are available for the target language. Linguists make a very conservative estimate that 4000 languages are spoken today in the world, and in many of these languages, very limited linguistic knowledge and speech data/resources are available. Rapid transition to a new target language becomes a practical concern within the concept of tiered resources (e.g., different amounts of acoustically matched/mismatched data). In this paper, we present our research efforts towards multilingual spoken information retrieval with limitations in acoustic training data. We propose different retrieval algorithms to leverage existing resources from resource-rich languages as well as the target language. Proposed algorithms employ confusion-embedded hybrid pronunciation networks, and lattice-based phonetic search within a proper name retrieval task. We use Latin-American Spanish as the target language by intentionally limiting available resources for this language. After searching for queries consisting of Spanish proper names in Spanish Broadcast News data, we demonstrate that retrieval performance degradations (due to data sparseness during automatic speech recognition (ASR) deployment in the target language) are compensated by employing English acoustic models. It is shown that the proposed algorithms for developing rapid transition of rich languages to underrepresented languages are able to achieve comparable retrieval performance using 25% of the available training data.


international conference on acoustics, speech, and signal processing | 2006

Spoken Proper Name Retrieval in Audio Streams for Limited-Resource Languages Via Lattice Based Search Using Hybrid Representations

Murat Akbacak; John H. L. Hansen

Research in multilingual speech recognition has shown that current speech recognition technology generalizes across different languages, and that similar modeling assumptions hold, provided that linguistic knowledge (e.g., phoneme inventory, pronunciation dictionary, etc.) and transcribed speech data are available for the target language. Linguists make a very conservative estimate that 4000 languages are spoken today in the world, and in many of these languages, very limited linguistic knowledge and speech data/resources are available. Rapid transition to a new target language becomes a practical concern within the concept of tiered resources. In this study, we present our research efforts towards multilingual spoken information retrieval with limitations in acoustic training data. We propose different retrieval algorithms to leverage existing resources from resource-rich languages as well as the target language using a lattice-based search. We use Latin-American Spanish as the target language. After searching for queries consisting of Spanish proper names in Spanish Broadcast News data, we obtain performance (max-F value of 28.3%) close to that of a Spanish based system (trained on speech data from 36 speakers) using only 25% of all the available speech data from the original target language


2nd Biennial Workshop on Digital Signal Processing for Mobile and Vehicular Systems, DSP 2005 | 2007

Advances in Acoustic Noise Tracking for Robust In-Vehicle Speech Systems

Murat Akbacak; John H. L. Hansen

Speech systems work reasonably well under homogeneous acoustic environmental conditions but become fragile in practical applications involving real-world environments (e.g., in-car, broadcast news, digital archives, etc.) where the audio stream contains multi-environment characteristics. To date, most approaches dealing with environmental noise in speech systems are based on assumptions concerning the noise, rather than exploring and characterizing the nature of the noise. In this chapter, we present our recent advances in the formulation and development of an in-vehicle environmental sniffing framework previously presented in [1,2,3,4]. The system is comprised of different components to detect, classify and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental change points. The second goal is to organize this knowledge in an effective manner to allow intelligent decisions to direct subsequent speech processing systems. After presenting our proposed in-vehicle environmental sniffing framework, we consider future directions and present discussion on supervised versus unsupervised noise clustering, and closed-set versus open-set noise classification.


conference of the international speech communication association | 2016

Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application.

Stephanie Pancoast; Murat Akbacak

In addition to the increasing number of publicly available multimedia documents generated and searched every day, there is also a large corpora of personalized videos, images and spoken recordings, stored on users’ private devices and/or in their personal accounts in the cloud. Retrieving spoken items via voice commonly involves supervised indexing approaches such as large vocabulary speech recognition. When these items are personalized recordings, diverse and personalized content causes recognition systems to experience mis-matches mostly in vocabulary and language model components, and sometimes even in the language users use. All of these contribute to retrieval task performing very poorly. Alternatively, common audio patterns can be captured and used for exampler-based retrieval in an unsupervised fashion but this approach has its limitations as well. In this work we explore supervised, unsupervised and fusion techniques to perform the retrieval of short personalized spoken utterances. On a small collection of personal recordings, we find that when fusing word, phoneme and unsupervised frame based systems, we can improve accuracy on the top retrieved item approximately 3% above the best performing individual system. Besides demonstrating this improvement on our initial collection, we hope to attract community’s interest to such novel personalized retrieval applications.


conference of the international speech communication association | 2012

Bag-of-Audio-Words Approach for Multimedia Event Classification.

Stephanie Pancoast; Murat Akbacak

Collaboration


Dive into the Murat Akbacak's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bryan L. Pellom

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bowen Zhou

University of Colorado Boulder

View shared research outputs
Researchain Logo
Decentralizing Knowledge