Libin Shen
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Libin Shen.
Machine Learning | 2005
Libin Shen; Aravind K. Joshi
This work is inspired by the so-called reranking tasks in natural language processing. In this paper, we first study the ranking, reranking, and ordinal regression algorithms proposed recently in the context of ranks and margins. Then we propose a general framework for ranking and reranking, and introduce a series of variants of the perceptron algorithm for ranking and reranking in the new framework. Compared to the approach of using pairwise objects as training samples, the new algorithms reduces the data complexity and training time. We apply the new perceptron algorithms to the parse reranking and machine translation reranking tasks, and study the performance of reranking by employing various definitions of the margins.
empirical methods in natural language processing | 2003
Libin Shen; Anoop Sarkar; Aravind K. Joshi
We propose the use of Lexicalized Tree Adjoining Grammar (LTAG) as a source of features that are useful for reranking the output of a statistical parser. In this paper, we extend the notion of a tree kernel over arbitrary sub-trees of the parse to the derivation trees and derived trees provided by the LTAG formalism, and in addition, we extend the original definition of the tree kernel, making it more lexicalized and more compact. We use LTAG based features for the parse reranking task and obtain labeled recall and precision of 89.7%/90.0% on WSJ section 23 of Penn Treebank for sentences of length ≤ 100 words. Our results show that the use of LTAG based tree kernel gives rise to a 17% relative difference in f-score improvement over the use of a linear kernel without LTAG based features.
north american chapter of the association for computational linguistics | 2003
Libin Shen; Aravind K. Joshi
This paper introduces a novel Support Vector Machines (SVMs) based voting algorithm for reranking, which provides a way to solve the sequential models indirectly. We have presented a risk formulation under the PAC framework for this voting algorithm. We have applied this algorithm to the parse reranking problem, and achieved labeled recall and precision of 89.4%/89.8% on WSJ section 23 of Penn Treebank.
empirical methods in natural language processing | 2009
Libin Shen; Jinxi Xu; Bing Zhang; Spyros Matsoukas; Ralph M. Weischedel
Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in a state-of-the-art hierarchical MT system. The features used in this work are non-terminal labels, non-terminal length distribution, source string context and source dependency LM scores. The effectiveness of our techniques is demonstrated by significant improvements over a strong base-line. On Arabic-to-English translation, improvements in lower-cased BLEU are 2.0 on NIST MT06 and 1.7 on MT08 newswire data on decoding output. On Chinese-to-English translation, the improvements are 1.0 on MT06 and 0.8 on MT08 newswire data.
Computational Linguistics | 2010
Libin Shen; Jinxi Xu; Ralph M. Weischedel
We propose a novel string-to-dependency algorithm for statistical machine translation. This algorithm employs a target dependency language model during decoding to exploit long distance word relations, which cannot be modeled with a traditional n-gram language model. Experiments show that the algorithm achieves significant improvement in MT performance over a state-of-the-art hierarchical string-to-string system on NIST MT06 and MT08 newswire evaluation sets.
empirical methods in natural language processing | 2005
Libin Shen; Aravind K. Joshi
We present a very efficient statistical incremental parser for LTAG-spinal, a variant of LTAG. The parser supports the full adjoining operation, dynamic predicate coordination, and non-projective dependencies, with a formalism of provably stronger generative capacity as compared to CFG. Using gold standard POS tags as input, on section 23 of the PTB, the parser achieves an f-score of 89.3% for syntactic dependency defined on LTAG derivation trees, which are deeper than the dependencies extracted from PTB alone with head rules (for example, in Magermans style).
empirical methods in natural language processing | 2008
Libin Shen; Aravind K. Joshi
In this paper, we first introduce a new architecture for parsing, bidirectional incremental parsing. We propose a novel algorithm for incremental construction, which can be applied to many structure learning problems in NLP. We apply this algorithm to LTAG dependency parsing, and achieve significant improvement on accuracy over the previous best result on the same data set.
language resources and evaluation | 2008
Libin Shen; Lucas Champollion; Aravind K. Joshi
We introduce LTAG-spinal, a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument–adjunct distinction are left underspecified in LTAG-spinal. LTAG-spinal with adjunction constraints is weakly equivalent to LTAG. The LTAG-spinal formalism is used to extract an LTAG-spinal Treebank from the Penn Treebank with Propbank annotation. Based on Propbank annotation, predicate coordination and LTAG adjunction structures are successfully extracted. The LTAG-spinal Treebank makes explicit semantic relations that are implicit or absent from the original PTB. LTAG-spinal provides a very desirable resource for statistical LTAG parsing, incremental parsing, dependency parsing, and semantic parsing. This treebank has been successfully used to train an incremental LTAG-spinal parser and a bidirectional LTAG dependency parser.
international joint conference on natural language processing | 2004
Libin Shen; Aravind K. Joshi
Perceptron like large margin algorithms are introduced for the experiments with various margin selections. Compared to the previous perceptron reranking algorithms, the new algorithms use full pairwise samples and allow us to search for margins in a larger space. Our experimental results on the data set of [1] show that a perceptron like ordinal regression algorithm with uneven margins can achieve Recall/Precision of 89.5/90.0 on section 23 of Penn Treebank. Our result on margin selection can be employed in other large margin machine learning algorithms as well as in other NLP tasks.
meeting of the association for computational linguistics | 2003
Libin Shen; Aravind K. Joshi
Supertagging is the tagging process of assigning the correct elementary tree of LTAG, or the correct supertag, to each word of an input sentence. In this paper we propose to use supertags to expose syntactic dependencies which are unavailable with POS tags. We first propose a novel method of applying Sparse Network of Winnow (SNoW) to sequential models. Then we use it to construct a supertagger that uses long distance syntactical dependencies, and the supertagger achieves an accuracy of 92.41%. We apply the supertagger to NP chunking. The use of supertags in NP chunking gives rise to almost 1% absolute increase (from 92.03% to 92.95%) in F-score under Transformation Based Learning(TBL) frame. The surpertagger described here provides an effective and efficient way to exploit syntactic information.