Gökhan Tür | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gökhan Tür is active.

Explore More

Publication

Featured researches published by Gökhan Tür.

Speech Communication | 2000

Prosody-based automatic segmentation of speech into sentences and topics

Elizabeth Shriberg; Andreas Stolcke; Dilek Hakkani-Tür; Gökhan Tür

Abstract A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models – for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

The CALO Meeting Assistant System

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; Stanley Peters; Dilek Hakkani-Tür; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.

spoken language technology workshop | 2008

The CALO meeting speech recognition and understanding system

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Dilek Hakkani-Tür; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Stanley Peters; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO meeting assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, question-answer pair identification, action item recognition, decision extraction, and summarization.

Computational Linguistics | 2001

Integrating prosodic and lexical cues for automatic topic segmentation

Gökhan Tür; Andreas Stolcke; Dilek Hakkani-Tür; Elizabeth Shriberg

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.

international conference on computational linguistics | 2000

Statistical morphological disambiguation for agglutinative languages

Dilek Hakkani-Tür; Kemal Oflazer; Gökhan Tür

In this paper, we present statistical models for morphological disambiguation in Turkish. Turkish presents an interesting problem for statistical models since the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflection groups in a trigram model. Among the three models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.

international conference on acoustics, speech, and signal processing | 2006

Multitask Learning for Spoken Language Understanding

Gökhan Tür

In this paper, we present a multitask learning (MTL) method for intent classification in goal oriented human-machine spoken dialog systems. MTL aims at training tasks in parallel while using a shared representation. What is learned for each task can help other tasks be learned better. Our goal is to automatically re-use the existing labeled data from various applications, which are similar but may have different intents or intent distributions, in order to improve the performance. For this purpose, we propose an automated intent mapping algorithm across applications. We also propose employing active learning to selectively sample the data to be re-used. Our results indicate that we can achieve significant improvements in intent classification performance especially when the labeled data size is limited

spoken language technology workshop | 2006

MODEL ADAPTATION FOR DIALOG ACT TAGGING

Gökhan Tür; Umit Guz; Dilek Hakkani-Tür

In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.

international conference on acoustics, speech, and signal processing | 2007

Unsupervised Languagemodel Adaptation for Meeting Recognition

Gökhan Tür; Andreas Stolcke

We present an application of unsupervised language model (ML) adaptation to meeting recognition, in a scenario where sequences of multiparty meetings on related topics are to be recognized, but no prior in-domain data for LM training is available. The recognizer LMs are adapted according to the recognition output on temporally preceding meetings, either in speaker-dependent or speaker-independent mode. Model adaptation is carried out by interpolating the n-gram probabilities of a large generic LM with those of a small LM estimated from adaptation data, and minimizing perplexity on the automatic transcripts of a separate meeting set, also previously recognized. The adapted LMs yield about 5.9% relative reduction in word error compared to the baseline. This improvement is about half of what can be achieved with supervised adaptation, i.e. using human-generated speech transcripts.

conference of the international speech communication association | 2016

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM.

Dilek Hakkani-Tür; Gökhan Tür; Asli Celikyilmaz; Yun-Nung Vivian Chen; Jianfeng Gao; Li Deng; Ye-Yi Wang

Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This paper proposes a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system, demonstrating the distinctive power of deep learning methods, namely bi-directional recurrent neural network (RNN) with long-short term memory (LSTM) cells (RNN-LSTM) to handle such complexity. The contributions of the presented work are three-fold: (i) we propose an RNN-LSTM architecture for joint modeling of slot filling, intent determination, and domain classification; (ii) we build a joint multi-domain model enabling multi-task deep learning where the data from each domain reinforces each other; (iii) we investigate alternative architectures for modeling lexical context in spoken language understanding. In addition to the simplicity of the single model framework, experimental results show the power of such an approach on Microsoft Cortana real user data over alternative methods based on single domain/task deep learning.

international conference on acoustics, speech, and signal processing | 2011

Exploiting query click logs for utterance domain detection in spoken language understanding

Dilek Hakkani-Tür; Larry P. Heck; Gökhan Tür

In this paper, we describe methods to exploit search queries mined from search engine query logs to improve domain detection in spoken language understanding. We propose extending the label propagation algorithm, a graph-based semi-supervised learning approach, to incorporate noisy domain information estimated from search engine links the users click following their queries. The main contributions of our work are the use of search query logs for domain classification, integration of noisy supervision into the semi-supervised label propagation algorithm, and sampling of high-quality query click data by mining query logs and using classification confidence scores. We show that most semi-supervised learning methods we experimented with improve the performance of the supervised training, and the biggest improvement is achieved by label propagation that uses noisy supervision. We reduce the to error rate of domain detection by 20% relative, from 6.2% to 5.0%.

Explore More