Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiaoqiang Luo is active.

Publication


Featured researches published by Xiaoqiang Luo.


empirical methods in natural language processing | 2005

On Coreference Resolution Performance Metrics

Xiaoqiang Luo

The paper proposes a Constrained Entity-Alignment F-Measure (CEAF) for evaluating coreference resolution. The metric is computed by aligning reference and system entities (or coreference chains) with the constraint that a system (reference) entity is aligned with at most one reference (system) entity. We show that the best alignment is a maximum bipartite matching problem which can be solved by the Kuhn-Munkres algorithm. Comparative experiments are conducted to show that the widely-known MUC F-measure has serious flaws in evaluating a coreference system. The proposed metric is also compared with the ACE-Value, the official evaluation metric in the Automatic Content Extraction (ACE) task, and we conclude that the proposed metric possesses some properties such as symmetry and better interpretability missing in the ACE-Value.


meeting of the association for computational linguistics | 2004

A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree

Xiaoqiang Luo; Abraham Ittycheriah; Hongyan Jing; Nanda Kambhatla; Salim Roukos

This paper proposes a new approach for coreference resolution which uses the Bell tree to represent the search space and casts the coreference resolution problem as finding the best path from the root of the Bell tree to the leaf nodes. A Maximum Entropy model is used to rank these paths. The coreference performance on the 2002 and 2003 Automatic Content Extraction (ACE) data will be reported. We also train a coreference system using the MUC6 data and competitive results are obtained.


meeting of the association for computational linguistics | 2014

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation

Sameer Pradhan; Xiaoqiang Luo; Marta Recasens; Eduard H. Hovy; Vincent Ng; Michael Strube

The definitions of two coreference scoring metrics- B3 and CEAF-are underspecified with respect to predicted, as opposed to key (or gold) mentions. Several variations have been proposed that manipulate either, or both, the key and predicted mentions in order to get a one-to-one mapping. On the other hand, the metric BLANC was, until recently, limited to scoring partitions of key mentions. In this paper, we (i) argue that mention manipulation for scoring predicted mentions is unnecessary, and potentially harmful as it could produce unintuitive results; (ii) illustrate the application of all these measures to scoring predicted mentions; (iii) make available an open-source, thoroughly-tested reference implementation of the main coreference evaluation measures; and (iv) rescore the results of the CoNLL-2011/2012 shared task systems with this implementation. This will help the community accurately measure and compare new end-to-end coreference resolution algorithms.


meeting of the association for computational linguistics | 2005

The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution

Imed Zitouni; Jeffrey S. Sorensen; Xiaoqiang Luo; Radu Florian

Arabic presents an interesting challenge to natural language processing, being a highly inflected and agglutinative language. In particular, this paper presents an in-depth investigation of the entity detection and recognition (EDR) task for Arabic. We start by highlighting why segmentation is a necessary prerequisite for EDR, continue by presenting a finite-state statistical segmenter, and then examine how the resulting segments can be better included into a mention detection system and an entity recognition system; both systems are statistical, build around the maximum entropy principle. Experiments on a clearly stated partition of the ACE 2004 data show that stem-based features can significantly improve the performance of the EDT system by 2 absolute F-measure points. The system presented here had a competitive performance in the ACE 2004 evaluation.


international conference on acoustics speech and signal processing | 1999

Probabilistic classification of HMM states for large vocabulary continuous speech recognition

Xiaoqiang Luo; Frederick Jelinek

In state-of-art large vocabulary continuous speech recognition (LVCSR) systems, HMM state-tying is often used to achieve good balance between the model resolution and robustness. In this paradigm, tied HMM states share a single set of parameters and are nondistinguishable. To capture the fine differences among tied HMM states, a probabilistic classification of HMM states (PCHMM) is proposed in this paper for LVCSR. In particular, a distribution from a HMM state to classes is introduced. It is shown that the state-to-class distribution can be estimated together with conventional HMM parameters within the EM (Dempster et al., 1977) framework. Compared with HMM state-tying, probabilistic classification of HMM states makes more efficient use of model parameters. It also makes the acoustic model more robust against the possible mismatch or variation between training and test data. The viability of this approach is verified by the significant reduction of word error rate (WER) on the Switchboard (Godfrey et al., 1992) task.


empirical methods in natural language processing | 2003

HowtogetaChineseName(Entity): segmentation and combination issues

Hongyan Jing; Radu Florian; Xiaoqiang Luo; Tong Zhang; Abraham Ittycheriah

When building a Chinese named entity recognition system, one must deal with certain language-specific issues such as whether the model should be based on characters or words. While there is no unique answer to this question, we discuss in detail advantages and disadvantages of each model, identify problems in segmentation and suggest possible solutions, presenting our observations, analysis, and experimental results. The second topic of this paper is classifier combination. We present and describe four classifiers for Chinese named entity recognition and describe various methods for combining their outputs. The results demonstrate that classifier combination is an effective technique of improving system performance: experiments over a large annotated corpus of fine-grained entity types exhibit a 10% relative reduction in F-measure error.


meeting of the association for computational linguistics | 1996

An Iterative Algorithm to Build Chinese Language Models

Xiaoqiang Luo; Salim Roukos

We present an iterative procedure to build a Chinese language model (LM). We segment Chinese text into words based on a word-based Chinese language model. However, the construction of a Chinese LM itself requires word boundaries. To get out of the chicken-and-egg problem, we propose an iterative procedure that alternates two operations: segmenting text into words and building an LM. Starting with an initial segmented corpus and an LM based upon it, we use a Viterbi-liek algorithm to segment another set of data. Then, we build an LM based on the second set and use the resulting LM to segment again the first corpus. The alternating procedure provides a self-organized way for the segmenter to detect automatically unseen words and correct segmentation errors. Our preliminary experiment shows that the alternating procedure not only improves the accuracy of our segmentation, but discovers unseen words suprisingly well. The resulting word-based LM has a perplexity of 188 for a general Chinese corpus.


international conference on acoustics, speech, and signal processing | 2001

Speech recognition for DARPA Communicator

Andrew Aaron; Scott Saobing Chen; Paul S. Cohen; Satya Dharanipragada; Ellen Eide; Martin Franz; Jean-Michel LeRoux; Xiaoqiang Luo; Benoît Maison; Lidia Mangu; T. Mathes; Miroslav Novak; Peder A. Olsen; Michael Picheny; Harry Printz; Bhuvana Ramabhadran; Andrej Sakrajda; George Saon; Borivoj Tydlitát; Karthik Visweswariah; D. Yuk

We report the results of investigations in acoustic modeling, language modeling and decoding techniques, for the DARPA Communicator, a speaker-independent, telephone-based dialog system. By a combination of methods, including enlarging the acoustic model, augmenting the recognizer vocabulary, conditioning the language model upon the dialog state, and applying a post-processing decoding method, we lowered the overall word error rate from 21.9% to 15.0%, a gain of 6.9% absolute and 31.5% relative.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

A Cascaded Approach to Mention Detection and Chaining in Arabic

Imed Zitouni; Xiaoqiang Luo; Radu Florian

This paper presents a fully statistical approach to Arabic mention detection and chaining system, built around the maximum entropy principle. The presented system takes a cascade approach to processing an input document, by first detecting mentions in the document and then chaining the identified mentions into entities. Both system components use a common maximum entropy framework, which allows the integration of a large array of feature types, including lexical, morphological, syntactic, and semantic features. Arabic offers additional challenges for this task (when compared with English, for example), as segmentation is a needed processing step, so one can correctly identify and resolve enclitic pronouns. The system presented has obtained very competitive performance in the automatic content extraction (ACE) evaluation program.


north american chapter of the association for computational linguistics | 2009

Classifier Combination Techniques Applied to Coreference Resolution

Smita Vemulapalli; Xiaoqiang Luo; John F. Pitrelli; Imed Zitouni

This paper examines the applicability of classifier combination approaches such as bagging and boosting for coreference resolution. To the best of our knowledge, this is the first effort that utilizes such techniques for coreference resolution. In this paper, we provide experimental evidence which indicates that the accuracy of the coreference engine can potentially be increased by use of bagging and boosting methods, without any additional features or training data. We implement and evaluate combination techniques at the mention, entity and document level, and also address issues like entity alignment, that are specific to coreference resolution.

Collaboration


Dive into the Xiaoqiang Luo's collaboration.

Researchain Logo
Decentralizing Knowledge