Lingpeng Kong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lingpeng Kong is active.

Explore More

Publication

Featured researches published by Lingpeng Kong.

empirical methods in natural language processing | 2014

A Dependency Parser for Tweets

Lingpeng Kong; Nathan Schneider; Swabha Swayamdipta; Archna Bhatia; Chris Dyer; Noah A. Smith

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions. Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.

conference of the international speech communication association | 2016

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Liang Lu; Lingpeng Kong; Chris Dyer; Noah A. Smith; Steve Renals

We study the segmental recurrent neural network for end-to-end acoustic modelling. This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction. Compared to most previous CRF-based acoustic models, it does not rely on an external system to provide features or segmentation boundaries. Instead, this model marginalises out all the possible segmentations, and features are extracted from the RNN trained together with the segmental CRF. In essence, this model is self-contained and can be trained end-to-end. In this paper, we discuss practical training and decoding issues as well as the method to speed up the training in the context of speech recognition. We performed experiments on the TIMIT dataset. We achieved 17.3 phone error rate (PER) from the first-pass decoding --- the best reported result using CRFs, despite the fact that we only used a zeroth-order CRF and without using any language model.

empirical methods in natural language processing | 2015

Bayesian Optimization of Text Representations

Dani Yogatama; Lingpeng Kong; Noah A. Smith

When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts. These choices can have a big effect on performance, but they are often uninteresting to researchers or practitioners who simply need a module that performs well. We propose an approach to optimizing over this space of choices, formulating the problem as global optimization. We apply a sequential model-based optimization technique and show that our method makes standard linear models competitive with more sophisticated, expensive state-of-the-art methods based on latent variable models or neural networks on various topic classification and sentiment analysis problems. Our approach is a first step towards black-box NLP systems that work with raw text and do not require manual tuning.

north american chapter of the association for computational linguistics | 2015

Transforming Dependencies into Phrase Structures.

Lingpeng Kong; Alexander M. Rush; Noah A. Smith

We present a new algorithm for transforming dependency parse trees into phrase-structure parse trees. We cast the problem as structured prediction and learn a statistical model. Our algorithm is faster than traditional phrasestructure parsing and achieves 90.4% English parsing accuracy and 82.4% Chinese parsing accuracy, near to the state of the art on both benchmarks.

empirical methods in natural language processing | 2014

Dependency Parsing for Weibo: An Efficient Probabilistic Logic Programming Approach

William Yang Wang; Lingpeng Kong; Kathryn Mazaitis; William W. Cohen

Dependency parsing is a core task in NLP, and it is widely used by many applications such as information extraction, question answering, and machine translation. In the era of social media, a big challenge is that parsers trained on traditional newswire corpora typically suffer from the domain mismatch issue, and thus perform poorly on social media data. We present a new GFL/FUDG-annotated Chinese treebank with more than 18K tokens from Sina Weibo (the Chinese equivalent of Twitter). We formulate the dependency parsing problem as many small and parallelizable arc prediction tasks: for each task, we use a programmable probabilistic firstorder logic to infer the dependency arc of a token in the sentence. In experiments, we show that the proposed model outperforms an off-the-shelf Stanford Chinese parser, as well as a strong MaltParser baseline that is trained on the same in-domain data.

Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing | 2015

ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer

Ting-Hao (Kenneth) Huang; Yun-Nung Chen; Lingpeng Kong

While morphological information has been demonstrated to be useful for various Chinese NLP tasks, there is still a lack of complete theories, category schemes, and toolkits for Chinese morphology. This paper focuses on the morphological structures of Chinese bi-character words, where a corpus were collected based on a welldefined morphological type scheme covering both Chinese derived words and compound words. With the corpus, a morphological analyzer is developed to classify Chinese bi-character words into the defined categories, which outperforms strong baselines and achieves about 66% macro F-measure for compound words, and effectively covers derived words.

international conference on asian language processing | 2011

Improving Chinese Dependency Parsing with Self-Disambiguating Patterns

Likun Qiu; Lei Wu; Kai Zhao; Changjian Hu; Lingpeng Kong

To solve the data sparseness problem in dependency parsing, most previous studies used features constructed from large-scale auto-parsed data. Unlike previous work, we propose a new approach to improve dependency parsing with context-free dependency triples (CDT) extracted by using self-disambiguating patterns (SDP). The use of SDP makes it possible to avoid the dependency on a baseline parser and explore the influence of different types of substructures one by one. Additionally, taking the available CDTs as seeds, a label propagation process is used to tag a large number of unlabeled word pairs as CDTs. Experiments show that, when CDT features are integrated into a maximum spanning tree (MST) dependency parser, the new parser improves significantly over the baseline MST parser. Comparative results also show that CDTs with dependency relation labels perform much better than CDT without dependency relation label.

arXiv: Machine Learning | 2017

DyNet: The Dynamic Neural Network Toolkit.

Graham Neubig; Chris Dyer; Yoav Goldberg; Austin Matthews; Waleed Ammar; Antonios Anastasopoulos; Miguel Ballesteros; David Chiang; Daniel Clothiaux; Trevor Cohn; Kevin Duh; Manaal Faruqui; Cynthia Gan; Dan Garrette; Yangfeng Ji; Lingpeng Kong; Adhiguna Kuncoro; Gaurav Kumar; Chaitanya Malaviya; Paul Michel; Yusuke Oda; Matthew Richardson; Naomi Saphra; Swabha Swayamdipta; Pengcheng Yin

international conference on learning representations | 2016