Is this you? Create Your Porfile

Xiaoman Pan

Rensselaer Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoman Pan is active.

Explore More

Publication

Featured researches published by Xiaoman Pan.

north american chapter of the association for computational linguistics | 2015

Unsupervised Entity Linking with Abstract Meaning Representation

Xiaoman Pan; Taylor Cassidy; Ulf Hermjakob; Heng Ji; Kevin Knight

Most successful Entity Linking (EL) methods aim to link mentions to their referent entities in a structured Knowledge Base (KB) by comparing their respective contexts, often using similarity measures. While the KB structure is given, current methods have suffered from impoverished information representations on the mention side. In this paper, we demonstrate the effectiveness of Abstract Meaning Representation (AMR) (Banarescu et al., 2013) to select high quality sets of entity “collaborators” to feed a simple similarity measure (Jaccard) to link entity mentions. Experimental results show that AMR captures contextual properties discriminative enough to make linking decisions, without the need for EL training data, and that system with AMR parsing output outperforms hand labeled traditional semantic roles as context representation for EL. Finally, we show promising preliminary results for using AMR to select sets of “coherent” entity mentions for collective entity linking 1 .

north american chapter of the association for computational linguistics | 2016

CAMR at SemEval-2016 Task 8: An Extended Transition-based AMR Parser

Chuan Wang; Sameer Pradhan; Xiaoman Pan; Heng Ji; Nianwen Xue

This paper describes CAMR, the transitionbased parser that we use in the SemEval-2016 Meaning Representation Parsing task. The main contribution of this paper is a description of the additional sources of information that we use as features in the parsing model to further boost its performance. We start with our existing AMR parser and experiment with three sets of new features: 1) rich named entities, 2) a verbalization list, 3) semantic role labels. We also use the RPI Wikifier to wikify the concepts in the AMR graph. Our parser achieves a Smatch F-score of 62% on the official blind test set.

north american chapter of the association for computational linguistics | 2016

Name Tagging for Low-resource Incident Languages based on Expectation-driven Learning

Boliang Zhang; Xiaoman Pan; Tianlu Wang; Ashish Vaswani; Heng Ji; Kevin Knight; Daniel Marcu

In this paper we tackle a challenging name tagging problem in an emergent setting the tagger needs to be complete within a few hours for a new incident language (IL) using very few resources. Inspired by observing how human annotators attack this challenge, we propose a new expectation-driven learning framework. In this framework we rapidly acquire, categorize, structure and zoom in on ILspecific expectations (rules, features, patterns, gazetteers, etc.) from various non-traditional sources: consulting and encoding linguistic knowledge from native speakers, mining and projecting patterns from both mono-lingual and cross-lingual corpora, and typing based on cross-lingual entity linking. We also propose a cost-aware combination approach to compose expectations. Experiments on seven low-resource languages demonstrate the effectiveness and generality of this framework: we are able to setup a name tagger for a new IL within two hours, and achieve 33.8%-65.1% F-score 1.

meeting of the association for computational linguistics | 2014

Be Appropriate and Funny: Automatic Entity Morph Encoding

Boliang Zhang; Hongzhao Huang; Xiaoman Pan; Heng Ji; Kevin Knight; Zhen Wen; Yizhou Sun; Jiawei Han; Bülent Yener

Internet users are keen on creating different kinds of morphs to avoid censorship, express strong sentiment or humor. For example, in Chinese social media, users often use the entity morph “¹ ? b (Instant Noodles)” to refer to “h 8 · (Zhou Yongkang)” because it shares one character “· (Kang)” with the well-known brand of instant noodles “·� (Master Kang)”. We developed a wide variety of novel approaches to automatically encode proper and interesting morphs, which can effectively pass decoding tests 1 .

meeting of the association for computational linguistics | 2017

Cross-lingual Name Tagging and Linking for 282 Languages.

Xiaoman Pan; Boliang Zhang; Jonathan May; Joel Nothman; Kevin Knight; Heng Ji

The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.

empirical methods in natural language processing | 2016

The Gun Violence Database: A new task and data set for NLP.

Ellie Pavlick; Heng Ji; Xiaoman Pan; Chris Callison-Burch

We argue that NLP researchers are especially well-positioned to contribute to the national discussion about gun violence. Reasoning about the causes and outcomes of gun violence is typically dominated by politics and emotion, and data-driven research on the topic is stymied by a shortage of data and a lack of federal funding. However, data abounds in the form of unstructured text from news articles across the country. This is an ideal application of NLP technologies, such as relation extraction, coreference resolution, and event detection. We introduce a new and growing dataset, the Gun Violence Database, in order to facilitate the adaptation of current NLP technologies to the domain of gun violence, thus enabling better social science research on this important and under-resourced problem.

meeting of the association for computational linguistics | 2016

A Multi-media Approach to Cross-lingual Entity Knowledge Transfer

Di Lu; Xiaoman Pan; Nima Pourdamghani; Shih-Fu Chang; Heng Ji; Kevin Knight

When a large-scale incident or disaster occurs, there is often a great demand for rapidly developing a system to extract detailed and new information from lowresource languages (LLs). We propose a novel approach to discover comparable documents in high-resource languages (HLs), and project Entity Discovery and Linking results from HLs documents back to LLs. We leverage a wide variety of language-independent forms from multiple data modalities, including image processing (image-to-image retrieval, visual similarity and face recognition) and sound matching. We also propose novel methods to learn entity priors from a large-scale HL corpus and knowledge base. Using Hausa and Chinese as the LLs and English as the HL, experiments show that our approach achieves 36.1% higher Hausa name tagging F-score over a costly supervised model, and 9.4% higher Chineseto-English Entity Linking accuracy over state-of-the-art.

international joint conference on natural language processing | 2015

Context-aware Entity Morph Decoding

Boliang Zhang; Hongzhao Huang; Xiaoman Pan; Sujian Li; Chin Yew Lin; Heng Ji; Kevin Knight; Zhen Wen; Yizhou Sun; Jiawei Han; Bulent Yener

People create morphs, a special type of fake alternative names, to achieve certain communication goals such as expressing strong sentiment or evading censors. For example, “Black Mamba”, the name for a highly venomous snake, is a morph that Kobe Bryant created for himself due to his agility and aggressiveness in playing basketball games. This paper presents the first end-to-end context-aware entity morph decoding system that can automatically identify, disambiguate, verify morph mentions based on specific contexts, and resolve them to target entities. Our approach is based on an absolute “cold-start” it does not require any candidate morph or target entity lists as input, nor any manually constructed morph-target pairs for training. We design a semi-supervised collective inference framework for morph mention extraction, and compare various deep learning based approaches for morph resolution. Our approach achieved significant improvement over the state-of-the-art method (Huang et al., 2013), which used a large amount of training data. 1

Big Data | 2017

Liberal entity extraction: Rapid construction of fine-grained entity typing systems

Lifu Huang; Jonathan May; Xiaoman Pan; Heng Ji; Xiang Ren; Jiawei Han; Lin Zhao; James A. Hendler

The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

Proceedings of the Sixth Named Entity Workshop | 2016

Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration.

Ying Lin; Xiaoman Pan; Aliya Deri; Heng Ji; Kevin Knight

Traditional name transliteration methods largely ignore source context information and inter-dependency among entities for entity disambiguation. We propose a novel approach to leverage state-of-the-art Entity Linking (EL) techniques to automatically correct name transliteration results, using collective inference from source contexts and additional evidence from knowledge base. Experiments on transliterating names from seven languages to English demonstrate that our approach achieves 2.6% to 15.7% absolute gain over the baseline model, and significantly advances state-of-the-art. When contextual information exists, our approach can achieve further gains (24.2%) by collectively transliterating and disambiguating multiple related entities. We also prove that combining Entity Linking and projecting resources from related languages obtained comparable performance as themethod using the same amount of training pairs in the original languageswithout Entity Linking.1

Explore More