Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dilek Hakkani-Tür is active.

Publication


Featured researches published by Dilek Hakkani-Tür.


Speech Communication | 2000

Prosody-based automatic segmentation of speech into sentences and topics

Elizabeth Shriberg; Andreas Stolcke; Dilek Hakkani-Tür; Gökhan Tür

Abstract A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models – for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.


Archive | 2003

Building a Turkish Treebank

Kemal Oflazer; Bilge Say; Dilek Hakkani-Tür; Gokhan Tur

We present the issues that we have encountered in designing a treebank architecture for Turkish along with rationale for the choices we have made for various representation schemes. In the resulting representation, the information encoded in the complex agglutinative word structures are represented as a sequence of inflectional groups separated by derivational boundaries. The syntactic relations are encoded as labeled dependency relations among segments of lexical items marked by derivation boundaries. Our current work involves refining a set of treebank annotation guidelines and developing a sophisticated annotation tool with an extendable plug-in architecture for morphological analysis, morphological disambiguation and syntactic annotation disambiguation.


Speech Communication | 2005

Combining active and semi-supervised learning for spoken language understanding

Dilek Hakkani-Tür; Robert E. Schapire; Gokhan Tur

In this paper, we describe active and semi-supervised learning methods for reducing the labeling effort for spoken language understanding. In a goal-oriented call routing system, understanding the intent of the user can be framed as a classification problem. State of the art statistical classification systems are trained using a large number of human-labeled utterances, preparation of which is labor intensive and time consuming. Active learning aims to minimize the number of labeled utterances by automatically selecting the utterances that are likely to be most informative for labeling. The method for active learning we propose, inspired by certainty-based active learning, selects the examples that the classifier is the least confident about. The examples that are classified with higher confidence scores (hence not selected by active learning) are exploited using two semi-supervised learning methods. The first method augments the training data by using the machine-labeled classes for the unlabeled utterances. The second method instead augments the classification model trained using the human-labeled utterances with the machine-labeled ones in a weighted manner. We then combine active and semi-supervised learning using selectively sampled and automatically labeled data. This enables us to exploit all collected data and alleviates the data imbalance problem caused by employing only active or semi-supervised learning. We have evaluated these active and semi-supervised learning methods with a call classification system used for AT&T customer care. Our results indicate that it is possible to reduce human labeling effort significantly. 2004 Elsevier B.V. All rights reserved.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

The CALO Meeting Assistant System

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; Stanley Peters; Dilek Hakkani-Tür; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.


spoken language technology workshop | 2008

The CALO meeting speech recognition and understanding system

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Dilek Hakkani-Tür; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Stanley Peters; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO meeting assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, question-answer pair identification, action item recognition, decision extraction, and summarization.


Computational Linguistics | 2001

Integrating prosodic and lexical cues for automatic topic segmentation

Gökhan Tür; Andreas Stolcke; Dilek Hakkani-Tür; Elizabeth Shriberg

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.


IEEE Transactions on Speech and Audio Processing | 2005

Active learning: theory and applications to automatic speech recognition

Giuseppe Riccardi; Dilek Hakkani-Tür

We are interested in the problem of adaptive learning in the context of automatic speech recognition (ASR). In this paper, we propose an active learning algorithm for ASR. Automatic speech recognition systems are trained using human supervision to provide transcriptions of speech utterances. The goal of Active Learning is to minimize the human supervision for training acoustic and language models and to maximize the performance given the transcribed and untranscribed data. Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. In this paper we describe how to estimate the confidence score for each utterance through an on-line algorithm using the lattice output of a speech recognizer. The utterance scores are filtered through the informativeness function and an optimal subset of training samples is selected. The active learning algorithm has been applied to both batch and on-line learning scheme and we have experimented with different selective sampling algorithms. Our experiments show that by using active learning the amount of labeled data needed for a given word accuracy can be reduced by more than 60% with respect to random sampling.


conference on security, steganography, and watermarking of multimedia contents | 2005

Natural Language Watermarking

Mikhail Mike Atallah; Srinivas Bangalore; Dilek Hakkani-Tür; Giuseppe Riccardi; Mercan Topkara; Umut Topkara

In this paper we discuss natural language watermarking, which uses the structure of the sentence constituents in natural language text in order to insert a watermark. This approach is different from techniques, collectively referred to as “text watermarking,” which embed information by modifying the appearance of text elements, such as lines, words, or characters. We provide a survey of the current state of the art in natural language watermarking and introduce terminology, techniques, and tools for text processing. We also examine the parallels and differences of the two watermarking domains and outline how techniques from the image watermarking domain may be applicable to the natural language watermarking domain.


Computers and The Humanities | 2002

Statistical Morphological Disambiguation for Agglutinative Languages

Dilek Hakkani-Tür; Kemal Oflazer; Gokhan Tur

We present statistical models for morphological disambiguation in agglutinative languages, with a specific application to Turkish. Turkish presents an interesting problem for statistical models as the potential tag set size is very large because of the productive derivational morphology. We propose to handle this by breaking up the morhosyntactic tags into inflectional groups, each of which contains the inflectional features for each (intermediate) derived form. Our statistical models score the probability of each morhosyntactic tag by considering statistics over the individual inflectional groups and surface roots in trigram models. Among the four models that we have developed and tested, the simplest model ignoring the local morphotactics within words performs the best. Our best trigram model performs with 93.95% accuracy on our test data getting all the morhosyntactic and semantic features correct. If we are just interested in syntactically relevant features and ignore a very small set of semantic features, then the accuracy increases to 95.07%.


international conference on acoustics, speech, and signal processing | 2002

Active learning for automatic speech recognition

Dilek Hakkani-Tür; Giuseppe Riccardi; Allen L. Gorin

State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor intensive and time-consuming. In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. We automatically estimate a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. We compute utterance confidence scores based on these word confidence scores, then selectively sample the utterances to be transcribed using the utterance confidence scores. In our experiments, we show that we reduce the amount of labeled data needed for a given word accuracy by 27%.

Collaboration


Dive into the Dilek Hakkani-Tür's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benoit Favre

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dustin Hillard

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge