Kadri Hacioglu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kadri Hacioglu is active.

Explore More

Publication

Featured researches published by Kadri Hacioglu.

north american chapter of the association for computational linguistics | 2004

Automatic tagging of Arabic text: from raw text to base phrase chunks

Mona T. Diab; Kadri Hacioglu; Daniel Jurafsky

To date, there are no fully automated systems addressing the communitys need for fundamental language processing tools for Arabic text. In this paper, we present a Support Vector Machine (SVM) based approach to automatically tokenize (segmenting off clitics), part-of-speech (POS) tag and annotate base phrases (BPs) in Arabic text. We adapt highly accurate tools that have been developed for English text and apply them to Arabic text. Using standard evaluation metrics, we report that the SVM-TOK tokenizer achieves an Fβ=1 score of 99.12, the SVM-POS tagger achieves an accuracy of 95.49%, and the SVM-BP chunker yields an Fβ=1 score of 92.08.

meeting of the association for computational linguistics | 2005

Semantic Role Labeling Using Different Syntactic Views

Sameer S. Pradhan; Wayne H. Ward; Kadri Hacioglu; James H. Martin; Daniel Jurafsky

Semantic role labeling is the process of annotating the predicate-argument structure in text with semantic labels. In this paper we present a state-of-the-art baseline semantic role labeling system based on Support Vector Machine classifiers. We show improvements on this system by: i) adding new features including features extracted from dependency parses, ii) performing feature selection and calibration and iii) combining parses obtained from semantic parsers trained using different syntactic views. Error analysis of the baseline system showed that approximately half of the argument identification errors resulted from parse errors in which there was no syntactic constituent that aligned with the correct argument. In order to address this problem, we combined semantic parses from a Minipar syntactic parse and from a chunked syntactic representation with our original baseline system which was based on Charniak parses. All of the reported techniques resulted in performance improvements.

north american chapter of the association for computational linguistics | 2003

Question classification with support vector machines and error correcting codes

Kadri Hacioglu; Wayne H. Ward

In this paper we consider a machine learning technique for question classification. The goal is to replace our regular expression based classifier with a classifier that learns from a set of labeled questions. We have realized that an enourmous amount of time is required to create a rich collection of patterns and keywords for a good coverage of questions in an open-domain application. We decided to use support vector machines, since they have been successfully used for a number of benchmark problems. Although the support vector machines are inherently binary classifiers, it is possible to extend their use as multi-class classifiers using binary codes. We represent questions as frequency weighted vectors of salient terms. We compare our approcah to related work that uses relatively complex syntactic/semantic processing to create features and a sparse network of linear units to classify questions. We provide results to show performance of the method.

international conference on computational linguistics | 2004

Semantic role labeling using dependency trees

Kadri Hacioglu

In this paper, a novel semantic role labeler based on dependency trees is developed. This is accomplished by formulating the semantic role labeling as a classification problem of dependency relations into one of several semantic roles. A dependency tree is created from a constituency parse of an input sentence. The dependency tree is then linearized into a sequence of dependency relations. A number of features are extracted for each dependency relation using a predefined linguistic context. Finally, the features are input to a set of one-versus-all support vector machine (SVM) classifiers to determine the corresponding semantic role label. We report results on CoNLL2004 shared task data using the representation and scoring scheme adopted for that task.

international conference on acoustics, speech, and signal processing | 2003

Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task

Bryan L. Pellom; Kadri Hacioglu

We report on recent improvements in the University of Colorado system for the DARPA/NRL Speech in Noisy Environments (SPINE) task. In particular, we describe our efforts on improving acoustic and language modeling for the task and investigate methods for unsupervised speaker and environment adaptation from limited data. We show that the MAPLR adaptation method outperforms single and multiple regression class MLLR on the SPINE task. Our current SPINE system uses the Sonic speech recognition engine that was developed at the University of Colorado. This system is shown to have a word error rate of 31.5% on the SPINE-2 evaluation data. These improvements amount to a 16% reduction in relative word error rate compared to our previous SPINE-2 system fielded in the November 2001 DARPA/NRL evaluation.

conference on computational natural language learning | 2005

Semantic Role Chunking Combining Complementary Syntactic Views

Sameer S. Pradhan; Kadri Hacioglu; Wayne H. Ward; James H. Martin; Daniel Jurafsky

This paper describes a semantic role labeling system that uses features derived from different syntactic views, and combines them within a phrase-based chunking paradigm. For an input sentence, syntactic constituent structure parses are generated by a Charniak parser and a Collins parser. Semantic role labels are assigned to the constituents of each parse using Support Vector Machine classifiers. The resulting semantic role labels are converted to an IOB representation. These IOB representations are used as additional features, along with flat syntactic chunks, by a chunking SVM classifier that produces the final SRL output. This strategy for combining features from three different syntactic views gives a significant improvement in performance over roles produced by using any one of the syntactic views individually.

international conference on computational linguistics | 2005

Automatic time expression labeling for english and chinese text

Kadri Hacioglu; Ying Chen; Benjamin Douglas

In this paper, we describe systems for automatic labeling of time expressions occurring in English and Chinese text as specified in the ACE Temporal Expression Recognition and Normalization (TERN) task. We cast the chunking of text into time expressions as a tagging problem using a bracketed representation at token level, which takes into account embedded constructs. We adopted a left-to-right, token-by-token, discriminative, deterministic classification scheme to determine the tags for each token. A number of features are created from a predefined context centered at each token and augmented with decisions from a rule-based time expression tagger and/or a statistical time expression tagger trained on different type of text data, assuming they provide complementary information. We trained one-versus-all multi-class classifiers using support vector machines. We participated in the TERN 2004 recognition task and achieved competitive results.

north american chapter of the association for computational linguistics | 2003

Target word detection and semantic role chunking using support vector machines

Kadri Hacioglu; Wayne H. Ward

In this paper, the automatic labeling of semantic roles in a sentence is considered as a chunking task. We define a semantic chunk as the sequence of words that fills a semantic role defined in a semantic frame. It is straightforward to convert chunking into a tagging task using one of several IOB representations. Using this representation each word is tagged with I, which means that the word is inside a chunk, or with O, which means that the word is outside a chunk, or B, which means that the word is the beginning of a chunk. Tagging can also be seen as a multi-class classification problem. After recasting the multi-class problem as a number of binary-class problems, we use support vector machines to implement the binary classifiers. We explore two semantic chunking tasks. In the first task we simultaneously detect the target word and segments of semantic roles. In the second task, in addition, we label the semantic segments with their respective semantic role types. For both tasks, we present encouraging results of experiments carried out using the annotated FrameNet database.

international conference on acoustics, speech, and signal processing | 2001

Confidence measures for spoken dialogue systems

R. San-Segundo; Bryan L. Pellom; Kadri Hacioglu; Wayne H. Ward; José Manuel Pardo

This paper provides improved confidence assessment for detection of word-level speech recognition errors, out of domain utterances and incorrect concepts in the CU Communicator system. New features from the speech understanding component are proposed for confidence annotation at the utterance and concept levels. We consider a neural network to combine all features in each level. Using the data collected from a live telephony system, it is shown that 53.2% of incorrectly recognized words, 53.2% of out of domain utterances and 50.1% of incorrect concepts are detected at a 5% false rejection rate. In addition, the confidence measures are used to improve the word recognition accuracy. Several hypotheses from different speech recognizers are compiled into a word-graph. The word-graph is searched for the hypothesis with the best confidence. We report a 14.0% relative word error rate reduction after this confidence rescoring.

international conference on acoustics, speech, and signal processing | 2001

Dialog-context dependent language modeling combining n-grams and stochastic context-free grammars

Kadri Hacioglu; Wayne H. Ward

We present our research on dialog dependent language modeling. In accordance with a speech (or sentence) production model in a discourse we split language modeling into two components; namely, dialog dependent concept modeling and syntactic modeling. The concept model is conditioned on the last question prompted by the dialog system and it is structured using n-grams. The syntactic model, which consists of a collection of stochastic context-free grammars one for each concept, describes word sequences that may be used to express the concepts. The resulting LM is evaluated by rescoring N-best lists. We report significant perplexity improvement with moderate word error rate drop within the context of the CU Communicator System; a dialog system for making travel plans by accessing information about flights, hotels and car rentals.

Explore More