Boliang Zhang
Rensselaer Polytechnic Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Boliang Zhang.
data and text mining in bioinformatics | 2014
Jinguang Zheng; Daniel P. Howsmon; Boliang Zhang; Juergen Hahn; Deborah L. McGuinness; James A. Hendler; Heng Ji
The Entity Linking (EL) task links entity mentions from an unstructured document to entities in a knowledge base. Although this problem is well-studied in news and social media, this problem has not received much attention in the life science domain. One outcome of tackling the EL problem in the life sciences domain is to enable scientists to build computational models of biological processes with more efficiency. However, simply applying a news-trained entity linker produces inadequate results. Since existing supervised approaches require a large amount of manually-labeled training data, which is currently unavailable for the life science domain, we propose a novel unsupervised collective inference approach to link entities from unstructured full texts of biomedical literature to 300 ontologies. The approach leverages the rich semantic information and structures in ontologies for similarity computation and entity ranking. Without using any manual annotation, our approach significantly outperforms state-of-the-art supervised EL method (9% absolute gain in linking accuracy). Furthermore, the state-of-the-art supervised EL method requires 15,000 manually annotated entity mentions for training. These promising results establish a benchmark for the EL task in the life science domain1. We also provide in depth analysis and discussion on both challenges and opportunities on automatic knowledge enrichment for scientific literature. In this paper, we propose a novel unsupervised collective inference approach to address the EL problem in a new domain. We show that our unsupervised approach is able to outperform a current state-of-the-art supervised approach that has been trained with a large amount of manually labeled data. Life science presents an underrepresented domain for applying EL techniques. By providing a small benchmark data set and identifying opportunities, we hope to stimulate discussions across natural language processing and bioinformatics and motivate others to develop techniques for this largely untapped domain.
north american chapter of the association for computational linguistics | 2016
Boliang Zhang; Xiaoman Pan; Tianlu Wang; Ashish Vaswani; Heng Ji; Kevin Knight; Daniel Marcu
In this paper we tackle a challenging name tagging problem in an emergent setting the tagger needs to be complete within a few hours for a new incident language (IL) using very few resources. Inspired by observing how human annotators attack this challenge, we propose a new expectation-driven learning framework. In this framework we rapidly acquire, categorize, structure and zoom in on ILspecific expectations (rules, features, patterns, gazetteers, etc.) from various non-traditional sources: consulting and encoding linguistic knowledge from native speakers, mining and projecting patterns from both mono-lingual and cross-lingual corpora, and typing based on cross-lingual entity linking. We also propose a cost-aware combination approach to compose expectations. Experiments on seven low-resource languages demonstrate the effectiveness and generality of this framework: we are able to setup a name tagger for a new IL within two hours, and achieve 33.8%-65.1% F-score 1.
meeting of the association for computational linguistics | 2014
Boliang Zhang; Hongzhao Huang; Xiaoman Pan; Heng Ji; Kevin Knight; Zhen Wen; Yizhou Sun; Jiawei Han; Bülent Yener
Internet users are keen on creating different kinds of morphs to avoid censorship, express strong sentiment or humor. For example, in Chinese social media, users often use the entity morph “¹ ? b (Instant Noodles)” to refer to “h 8 · (Zhou Yongkang)” because it shares one character “· (Kang)” with the well-known brand of instant noodles “·� (Master Kang)”. We developed a wide variety of novel approaches to automatically encode proper and interesting morphs, which can effectively pass decoding tests 1 .
meeting of the association for computational linguistics | 2017
Xiaoman Pan; Boliang Zhang; Jonathan May; Joel Nothman; Kevin Knight; Heng Ji
The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating “silver-standard” annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.
international joint conference on natural language processing | 2015
Boliang Zhang; Hongzhao Huang; Xiaoman Pan; Sujian Li; Chin Yew Lin; Heng Ji; Kevin Knight; Zhen Wen; Yizhou Sun; Jiawei Han; Bulent Yener
People create morphs, a special type of fake alternative names, to achieve certain communication goals such as expressing strong sentiment or evading censors. For example, “Black Mamba”, the name for a highly venomous snake, is a morph that Kobe Bryant created for himself due to his agility and aggressiveness in playing basketball games. This paper presents the first end-to-end context-aware entity morph decoding system that can automatically identify, disambiguate, verify morph mentions based on specific contexts, and resolve them to target entities. Our approach is based on an absolute “cold-start” it does not require any candidate morph or target entity lists as input, nor any manually constructed morph-target pairs for training. We design a semi-supervised collective inference framework for morph mention extraction, and compare various deep learning based approaches for morph resolution. Our approach achieved significant improvement over the state-of-the-art method (Huang et al., 2013), which used a large amount of training data. 1
BMC Medical Informatics and Decision Making | 2015
Jinguang Zheng; Daniel P. Howsmon; Boliang Zhang; Juergen Hahn; Deborah L. McGuinness; James A. Hendler; Heng Ji
BackgroundThe Entity Linking (EL) task links entity mentions from an unstructured document to entities in a knowledge base. Although this problem is well-studied in news and social media, this problem has not received much attention in the life science domain. One outcome of tackling the EL problem in the life sciences domain is to enable scientists to build computational models of biological processes with more efficiency. However, simply applying a news-trained entity linker produces inadequate results.MethodsSince existing supervised approaches require a large amount of manually-labeled training data, which is currently unavailable for the life science domain, we propose a novel unsupervised collective inference approach to link entities from unstructured full texts of biomedical literature to 300 ontologies. The approach leverages the rich semantic information and structures in ontologies for similarity computation and entity ranking.ResultsWithout using any manual annotation, our approach significantly outperforms state-of-the-art supervised EL method (9% absolute gain in linking accuracy). Furthermore, the state-of-the-art supervised EL method requires 15,000 manually annotated entity mentions for training. These promising results establish a benchmark for the EL task in the life science domain. We also provide in depth analysis and discussion on both challenges and opportunities on automatic knowledge enrichment for scientific literature.ConclusionsIn this paper, we propose a novel unsupervised collective inference approach to address the EL problem in a new domain. We show that our unsupervised approach is able to outperform a current state-of-the-art supervised approach that has been trained with a large amount of manually labeled data. Life science presents an underrepresented domain for applying EL techniques. By providing a small benchmark data set and identifying opportunities, we hope to stimulate discussions across natural language processing and bioinformatics and motivate others to develop techniques for this largely untapped domain.
Machine Translation | 2018
Ulf Hermjakob; Qiang Li; Daniel Marcu; Jonathan May; Sebastian J. Mielke; Nima Pourdamghani; Michael Pust; Xing Shi; Kevin Knight; Tomer Levinboim; Kenton Murray; David Chiang; Boliang Zhang; Xiaoman Pan; Di Lu; Ying Lin; Heng Ji
We describe novel approaches to tackling the problem of natural language processing for low-resource languages. The approaches are embodied in systems for name tagging and machine translation (MT) that we constructed to participate in the NIST LoReHLT evaluation in 2016. Our methods include universal tools, rapid resource and knowledge acquisition, rapid language projection, and joint methods for MT and name tagging.
international conference on computational linguistics | 2016
Dongxu Zhang; Boliang Zhang; Xiaoman Pan; Xiaocheng Feng; Heng Ji; Weiran Xu
international joint conference on natural language processing | 2017
Boliang Zhang; Di Lu; Xiaoman Pan; Ying Lin; Halidanmu Abudukelimu; Heng Ji; Kevin Knight
Theory and Applications of Categories | 2017
Mohamed Al-Badrashiny; Jason Bolton; Arun Tejasvi Chaganty; Kevin Clark; Craig Harman; Lifu Huang; Matthew Lamm; Jinhao Lei; Di Lu; Xiaoman Pan; Ashwin Paranjape; Ellie Pavlick; Haoruo Peng; Peng Qi; Pushpendre Rastogi; Abigail See; Kai Sun; Max Thomas; Chen-Tse Tsai; Hao Wu; Boliang Zhang; Chris Callison-Burch; Claire Cardie; Heng Ji; Christopher D. Manning; Smaranda Muresan; Owen Rambow; Dan Roth; Mark Sammons; Benjamin Van Durme