Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Özgür Çetin is active.

Publication


Featured researches published by Özgür Çetin.


ACM Transactions on Speech and Language Processing | 2007

Web resources for language modeling in conversational speech recognition

Ivan Bulyko; Mari Ostendorf; Man-Hung Siu; Tim Ng; Andreas Stolcke; Özgür Çetin

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.


international conference on machine learning | 2005

Further progress in meeting recognition: the ICSI-SRI spring 2005 speech-to-text evaluation system

Andreas Stolcke; Xavier Anguera; Kofi Boakye; Özgür Çetin; Frantisek Grezl; Adam Janin; Arindam Mandal; Barbara Peskin; Chuck Wooters; Jing Zheng

We describe the development of our speech recognition system for the National Institute of Standards and Technology (NIST) Spring 2005 Meeting Rich Transcription (RT-05S) evaluation, highlighting improvements made since last year [1]. The system is based on the SRI-ICSI-UW RT-04F conversational telephone speech (CTS) recognition system, with meeting-adapted models and various audio preprocessing steps. This years system features better delay-sum processing of distant microphone channels and energy-based crosstalk suppression for close-talking microphones. Acoustic modeling is improved by virtue of various enhancements to the background (CTS) models, including added training data, decision-tree based state tying, and the inclusion of discriminatively trained phone posterior features estimated by multilayer perceptrons. In particular, we make use of adaptation of both acoustic models and MLP features to the meeting domain. For distant microphone recognition we obtained considerable gains by combining and cross-adapting narrow-band (telephone) acoustic models with broadband (broadcast news) models. Language models (LMs) were improved with the inclusion of new meeting and web data. In spite of a lack of training data, we created effective LMs for the CHIL lecture domain. Results are reported on RT-04S and RT-05S meeting data. Measured on RT-04S conference data, we achieved an overall improvement of 17% relative in both MDM and IHM conditions compared to last years evaluation system. Results on lecture data are comparable to the best reported results for that task.


international conference on acoustics, speech, and signal processing | 2007

An Articulatory Feature-Based Tandem Approach and Factored Observation Modeling

Özgür Çetin; Arthur Kantor; Simon King; Chris D. Bartels; Mathew Magimai-Doss; Joe Frankel; Karen Livescu

The so-called tandem approach, where the posteriors of a multilayer perceptron (MLP) classifier are used as features in an automatic speech recognition (ASR) system has proven to be a very effective method. Most tandem approaches up to date have relied on MLPs trained for phone classification, and appended the posterior features to some standard feature hidden Markov model (HMM). In this paper, we develop an alternative tandem approach based on MLPs trained for articulatory feature (AF) classification. We also develop a factored observation model for characterizing the posterior and standard features at the HMM outputs, allowing for separate hidden mixture and state-tying structures for each factor. In experiments on a subset of Switchboard, we show that the AF-based tandem approach is as effective as the phone-based approach, and that the factored observation model significantly outperforms the simple feature concatenation approach while using fewer parameters.


international conference on acoustics, speech, and signal processing | 2007

Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition

Jing Zheng; Özgür Çetin; Mei-Yuh Hwang; Xin Lei; Andreas Stolcke; Nelson Morgan

Recent developments in large vocabulary continuous speech recognition (LVCSR) have shown the effectiveness of discriminative training approaches, employing the following three representative techniques: discriminative Gaussian training using the minimum phone error (MPE) criterion, discriminately trained features estimated by multilayer perceptrons (MLPs); and discriminative feature transforms such as feature-level MPE (fMPE). Although MLP features, MPE models, and fMPE transforms have each been shown to improve recognition accuracy, no previous work has applied all three in a single LVCSR system. This paper uses a state-of-the-art Mandarin recognition system as a platform to study the interaction of all three techniques. Experiments in the broadcast news and broadcast conversation domains show that the contribution of each technique is nonredundant, and that the full combination yields the best performance and has good domain generalization.


international workshop on data mining and audience intelligence for advertising | 2009

Data-driven text features for sponsored search click prediction

Benyah Shaparenko; Özgür Çetin; Rukmini Iyer

Much search engine revenue comes from sponsored search ads displayed with algorithmic search results. To maximize revenue, it is essential to choose a good slate of ads for each query, requiring accurate prediction of whether or not users will click on an ad. Click prediction is relatively easy for the ads that have been displayed many times, and have significant click history, but in the long tail with minimal or no click history, other features are needed to predict user response. In this work, we investigate the use of novel text features for this problem, within the context of a state-of-the-art sponsored search system. In particular, we propose the use of detailed word-pair indicator features between the query and ad. We compare the new features to the traditional vector-space and language modeling features extracted in a typical information-retrieval style. We evaluate these approaches in a maximum-entropy ranking model using the click-view data from a commercial search-engine traffic. We show that the word-pair features are highly helpful for sponsored search click prediction, not only improving over the sophisticated click-history feedback based systems, but also compensating for the lack of click history to some extent. In contrast, we find that the language and vector-space modeling approaches are significantly less effective.


ieee automatic speech recognition and understanding workshop | 2007

Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS

Özgür Çetin; Mathew Magimai-Doss; Karen Livescu; Arthur Kantor; Simon King; Chris D. Bartels; Joe Frankel

The features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set -2000 hours of English conversational telephone speech. We also explore how portable phone-and articulatory feature-based tandem features are in an entirely different language - Mandarin - without any retraining. We find that while the phone-based features perform slightly better than AF-based features in the matched-language condition, they perform significantly better in the cross-language condition. However, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel factored observation modeling schemes that allow for greater flexibility in combining the tandem and standard features.


international conference on acoustics, speech, and signal processing | 2006

Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap

Özgür Çetin; E. Schriberg

We analyze automatic speech recognition (ASR) errors made by a state-of-the-art meeting recognizer, with respect to locations of overlapping speech. Our analysis focuses on recognition errors made both during an overlap and in the regions immediately preceding and following the location of overlapped speech. We devise an experimental paradigm to allow examination of the same foreground speech both with and without naturally occurring cross-talk. We then analyze ASR errors with respect to a number of factors, including the severity of the cross-talk and distance from the overlap region. In addition to reporting effects on ASR errors, we discover a number of interesting phenomena. First, we find that overlaps tend to occur at high-perplexity regions in the foreground talkers speech. Second, word sequences within overlaps have higher perplexity than those in nonoverlaps, if using trigrams or 4-grams, but the unigram perplexity within overlaps is considerably lower than that of nonoverlaps. An explanation for this behavior is proposed, based on the preponderance of multiple short dialog acts found in overlap regions. Third, we discover that the word error rate (WER) after overlaps is consistently lower than that before the overlap. This finding cannot be explained by the recognition process itself; rather, the foreground speaker appears to reduce perplexity shortly after being overlapped. Taken together, these observations suggest that the automatic modeling of meetings could benefit from a broader view of the relationship between speaker overlap and ASR in natural conversation


international conference on machine learning | 2006

Overlap in meetings: ASR effects and analysis by dialog factors, speakers, and collection site

Özgür Çetin; Elizabeth Shriberg

We analyze speaker overlap in multiparty meetings both in terms of automatic speech recognition (ASR) performance, and in terms of distribution of overlap with respect to various factors (collection site, speakers, dialog acts, and hot spots). Unlike most previous work on overlap or crosstalk, our ASR error analysis uses an approach that allows comparison of the same foreground speech with and without naturally occurring overlap, using a state-of-the-art meeting recognition system. We examine a total of 101 meetings. For analysis of ASR, we use 26 meetings from the NIST meeting transcription evaluations, and discover a number of interesting phenomena. First, overlaps tend to occur at high-perplexity regions in the foreground talkers speech. Second, overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. Third, word error rate (WER) after overlaps is consistently lower than that before the overlap, apparently because the foreground speaker reduces perplexity shortly after being overlapped. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content were conducted on a set of 75 additional meetings collected and annotated at ICSI. These analyses reveal interesting relationships between overlap and dialog acts, as well as between overlap and “hot spots” (points of increased participant involvement). Finally, results from this larger data set show that individual speakers have widely varying rates of being overlapped.


international conference on acoustics, speech, and signal processing | 2007

Manual Transcription of Conversational Speech at the Articulatory Feature Level

Karen Livescu; A. Bezman; M. Borges; Lisa Yung; Özgür Çetin; Joe Frankel; Simon King; Magimai-Doss; Xuemin Xhi; L. Lavoie

We present an approach for the manual labeling of speech at the articulatory feature level, and a new set of labeled conversational speech collected using this approach. A detailed transcription, including overlapping or reduced gestures, is useful for studying the great pronunciation variability in conversational speech. It also facilitates the testing of feature classifiers, such as those used in articulatory approaches to automatic speech recognition. We describe an effort to transcribe a small set of utterances drawn from the Switchboard database using eight articulatory tiers. Two transcribers have labeled these utterances in a multi-pass strategy, allowing for correction of errors. We describe the data collection methods and analyze the data to determine how quickly and reliably this type of transcription can be done. Finally, we demonstrate one use of the new data set by testing a set of multilayer perceptron feature classifiers against both the manual labels and forced alignments.


IEEE Transactions on Signal Processing | 2007

Multirate Coupled Hidden Markov Models and Their Application to Machining Tool-Wear Classification

Özgür Çetin; Mari Ostendorf; Gary D. Bernard

This paper introduces multirate coupled hidden Markov models (multirate HMMs, in short) for multiscale modeling of nonstationary processes, extending traditional HMMs from single to multiple time scales with hierarchical representations of the process state and observations. Scales in the multirate HMMs are organized in a coarse-to-fine manner with Markov conditional independence assumptions within and across scales, allowing for a parsimonious representation of both short- and long-term context and temporal dynamics. Efficient inference and parameter estimation algorithms for the multirate HMMs are given, which are similar to the analogous algorithms for HMMs. The model is applied to the classification of tool wear in titanium milling, for which acoustic emissions exhibit multiscale dynamics and long-range dependence. Experimental results show that the multirate extension outperforms HMMs in terms of both wear prediction accuracy and confidence estimation

Collaboration


Dive into the Özgür Çetin's collaboration.

Top Co-Authors

Avatar

Joe Frankel

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Karen Livescu

Toyota Technological Institute at Chicago

View shared research outputs
Top Co-Authors

Avatar

Simon King

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Adam Janin

University of California

View shared research outputs
Top Co-Authors

Avatar

Kofi Boakye

University of California

View shared research outputs
Top Co-Authors

Avatar

Xavier Anguera

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Chuck Wooters

International Computer Science Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge