Murat Saraclar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Murat Saraclar is active.

Explore More

Publication

Featured researches published by Murat Saraclar.

Computer Speech & Language | 2007

Discriminative n-gram language modeling

Brian Roark; Murat Saraclar; Michael Collins

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on maximizing the regularized conditional log-likelihood. The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We describe a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptrons weights; this method gives an additional 0.5% reduction in word error rate (WER) over training with the perceptron alone. The final system achieves a 1.8% absolute reduction in WER for a baseline first-pass recognition system (from 39.2% to 37.4%), and a 0.9% absolute reduction in WER for a multi-pass recognition system (from 28.9% to 28.0%).

meeting of the association for computational linguistics | 2004

Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm

Brian Roark; Murat Saraclar; Michael Collins; Mark Johnson

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5% reduction in word error rate, for a total 1.8% absolute reduction from the baseline of 39.2%.

IEEE Signal Processing Magazine | 2008

Retrieval and browsing of spoken content

Ciprian Chelba; Timothy J. Hazen; Murat Saraclar

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on ones own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.

Computer Speech & Language | 2000

Pronunciation modeling by sharing Gaussian densities across phonetic models

Murat Saraclar; Harriet J. Nock; Sanjeev Khudanpur

Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision trees to generate alternate word pronunciations from phonemic baseforms. Use of pronunciation models during recognition is known to improve accuracy. This paper describes the incorporation of pronunciation models into acoustic model training in addition to recognition. Subtle difficulties in the straightforward use of alternatives to canonical pronunciations are first illustrated: it is shown that simply improving the accuracy of the phonetic transcription used for acoustic model training is of little benefit. Acoustic models trained on the most accurate phonetic transcriptions result in worse recognition than acoustic models trained on canonical baseforms. Analysis of this counterintuitive result leads to a new method of accommodating nonstandard pronunciations: rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones, the hidden Markov model (HMM) states of the phoneme?s model are instead allowed to share Gaussian mixture components with the HMM states of the model(s) of the alternate realization(s). Qualitatively, this amounts to making a soft decision about which surface form is realized. Quantitatively, experiments show that this method is particularly well suited for acoustic model training for spontaneous speech: a 1.7 %(absolute) improvement in recognition accuracy on the Switchboard corpus is presented.

north american chapter of the association for computational linguistics | 2004

General indexation of weighted automata: application to spoken utterance retrieval

Cyril Allauzen; Mehryar Mohri; Murat Saraclar

Much of the massive quantities of digitized data widely available, e.g., text, speech, hand-written sequences, are either given directly, or, as a result of some prior processing, as weighted automata. These are compact representations of a large number of alternative sequences and their weights reflecting the uncertainty or variability of the data. Thus, the indexation of such data requires indexing weighted automata. We present a general algorithm for the indexation of weighted automata. The resulting index is represented by a deterministic weighted transducer that is optimal for search: the search for an input string takes time linear in the sum of the size of that string and the number of indices of the weighted automata where it appears. We also introduce a general framework based on weighted transducers that generalizes this indexation to enable the search for more complex patterns including syntactic information or for different types of sequences, e.g., word sequences instead of phonemic sequences. The use of this framework is illustrated with several examples. We applied our general indexation algorithm and framework to the problem of indexation of speech utterances and report the results of our experiments in several tasks demonstrating that our techniques yield comparable results to previous methods, while providing greater generality, including the possibility of searching for arbitrary patterns represented by weighted automata.

ACM Transactions on Speech and Language Processing | 2007

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

Mathias Creutz; Teemu Hirsimäki; Mikko Kurimo; Antti Puurula; Janne Pylkkönen; Vesa Siivola; Matti Varjokallio; Ebru Arisoy; Murat Saraclar; Andreas Stolcke

We explore the use of morph-based language models in large-vocabulary continuous-speech recognition systems across four so-called morphologically rich languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n-gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity. Standard word models suffer from high out-of-vocabulary (OOV) rates, whereas the morph models can recognize previously unseen word forms by concatenating morphs. It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words. The Arabic experiment constitutes the only exception since here the standard word model outperforms the morph model. Differences in the datasets and the amount of data are discussed as a plausible explanation.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Lattice Indexing for Spoken Term Detection

Dogan Can; Murat Saraclar

This paper considers the problem of constructing an efficient inverted index for the spoken term detection (STD) task. More specifically, we construct a deterministic weighted finite-state transducer storing soft-hits in the form of (utterance ID, start time, end time, posterior score) quadruplets. We propose a generalized factor transducer structure which retains the time information necessary for performing STD. The required information is embedded into the path weights of the factor transducer without disrupting the inherent optimality. We also describe how to index all substrings seen in a collection of raw automatic speech recognition lattices using the proposed structure. Our STD indexing/search implementation is built upon the OpenFst Library and is designed to scale well to large problems. Experiments on Turkish and English data sets corroborate our claims.

international conference on acoustics, speech, and signal processing | 2005

The AT&T WATSON speech recognizer

Vincent Goffin; Cyril Allauzen; Enrico Bocchieri; Dilek Hakkani-Tür; Andrej Ljolje; Sarangarajan Parthasarathy; Mazin G. Rahim; Giuseppe Riccardi; Murat Saraclar

This paper describes the AT&T WATSON real-time speech recognizer, the product of several decades of research at AT&T. The recognizer handles a wide range of vocabulary sizes and is based on continuous-density hidden Markov models for acoustic modeling and finite state networks for language modeling. The recognition network is optimized for efficient search. We identify the algorithms used for high-accuracy, real-time and low-latency recognition. We present results for small and large vocabulary tasks taken from the AT&T VoiceTone/sup /spl reg// service, showing word accuracy improvement of about 5% absolute and real-time processing speed-up by a factor between 2 and 3.

IEEE Transactions on Audio, Speech, and Language Processing | 2009

Turkish Broadcast News Transcription and Retrieval

Ebru Arisoy; Dogan Can; Siddika Parlak; Hasim Sak; Murat Saraclar

This paper summarizes our recent efforts for building a Turkish Broadcast News transcription and retrieval system. The agglutinative nature of Turkish leads to a high number of out-of-vocabulary (OOV) words which in turn lower automatic speech recognition (ASR) accuracy. This situation compromises the performance of speech retrieval systems based on ASR output. Therefore using a word-based ASR is not adequate for transcribing speech in Turkish. To alleviate this problem, various sub-word-based recognition units are utilized. These units solve the OOV problem with moderate size vocabularies and perform even better than a 500 K word vocabulary as far as recognition accuracy is concerned. As a novel approach, the interaction between recognition units, words and sub-words, and discriminative training is explored. Sub-word models benefit from discriminative training more than word models do, especially in the discriminative language modeling framework. For speech retrieval, a spoken term detection system based on automata indexation is utilized. As with transcription, retrieval performance is measured under various schemes incorporating words and sub-words. Best results are obtained using a cascade of word and sub-word indexes together with term-specific thresholding.

international conference on acoustics, speech, and signal processing | 2008

Spoken term detection for Turkish Broadcast News

Siddika Parlak; Murat Saraclar

In this paper, we present a baseline spoken term detection (STD) system for Turkish broadcast news. The agglutinative structure of Turkish causes a high out-of-vocabulary (OOV) rate and increases word error rate (WER) in automatic speech recognition. Several approaches are attempted to reduce this negative effect on the STD system. Sub-word units are used to handle the OOV queries and lattice-based indexing is used to obtain different operating points and handle high WER cases. A recently proposed method for setting term specific thresholds is also evaluated and extended to allow us to choose an operating point suitable for our needs. Best results are obtained by using a cascade of word and sub-word lattice indices with term-thresholding.

Explore More