Liling Tan
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liling Tan.
north american chapter of the association for computational linguistics | 2016
Liling Tan; Francis Bond; Josef van Genabith
This paper describes our submission to the SemEval-2016 Taxonomy Extraction Evaluation (TExEval-2) Task. We examine the endocentric nature of hyponyms and propose a simple rule-based method to identify hypernyms at high precision. For the food domain, we extract lists of terms from the Wikipedia lists of lists by using the name of each list as the endocentric head and treating all terms in the extracted tables as the hyponym of the endocentric head. Our submission achieved competitive results in taxonomy construction and ranked top in hypernym identification when evaluated against gold standard taxonomies and also in manual evaluation of novel relations not covered by the gold standard taxonomies.
north american chapter of the association for computational linguistics | 2015
Liling Tan; Rohit Gupta; Josef van Genabith
This paper describes the USAAR-WLV taxonomy induction system that participated in the Taxonomy Extraction Evaluation task of SemEval-2015. We extend prior work on using vector space word embedding models for hypernym-hyponym extraction by simplifying the means to extract a projection matrix that transforms any hyponym to its hypernym. This is done by making use of function words, which are usually overlooked in vector space approaches to NLP. Our system performs best in the chemical domain and has achieved competitive results in the overall evaluations.
north american chapter of the association for computational linguistics | 2015
Liling Tan; Carolina Scarton; Lucia Specia; Josef van Genabith
This paper describes the USAARSHEFFIELD systems that participated in the Semantic Textual Similarity (STS) English task of SemEval-2015. We extend the work on using machine translation evaluation metrics in the STS task. Different from previous approaches, we regard the metrics’ robustness across different text types and conflate the training data across different subcorpora. In addition, we introduce a novel deep regressor architecture and evaluated its efficiency in the STS task.
north american chapter of the association for computational linguistics | 2016
Liling Tan; Carolina Scarton; Lucia Specia; Josef van Genabith
This paper describes the SAARSHEFF systems that participated in the English Semantic Textual Similarity (STS) task in SemEval2016. We extend the work on using machine translation (MT) metrics in the STS task by automatically annotating the STS datasets with a variety of MT scores for each pair of text snippets in the STS datasets. We trained our systems using boosted tree ensembles and achieved competitive results that outperforms he median Pearson correlation scores from all participating systems.
workshop on statistical machine translation | 2015
Carolina Scarton; Liling Tan; Lucia Specia
We present the results of the USHEF and USAAR-USHEF submissions for the WMT15 shared task on document-level quality estimation. The USHEF submissions explored several document and discourse-aware features. The USAARUSHEF submissions used an exhaustive search approach to select the best features from the official baseline. Results show slight improvements over the baseline with the use of discourse features. More interestingly, we found that a model of comparable performance can be built with only three features selected by the exhaustive search procedure.
north american chapter of the association for computational linguistics | 2016
Jon Dehdari; Liling Tan; Josef van Genabith
Word clusters improve performance in many NLP tasks including training neural network language models, but current increases in datasets are outpacing the ability of word clusterers to handle them. In this paper we present a novel bidirectional, interpolated, refining, and alternating (BIRA) predictive exchange algorithm and introduce ClusterCat, a clusterer based on this algorithm. We show that ClusterCat is 3‐85 times faster than four other well-known clusterers, while also improving upon the predictive exchange algorithm’s perplexity by up to 18% . Notably, ClusterCat clusters a 2.5 billion token English News Crawl corpus in 3 hours. We also evaluate in a machine translation setting, resulting in shorter training times achieving the same translation quality measured in BLEU scores. ClusterCat is portable and freely available.
north american chapter of the association for computational linguistics | 2016
Jon Dehdari; Liling Tan; Josef van Genabith
Word clusters are useful for many NLP tasks including training neural network language models, but current increases in datasets are outpacing the ability of word clusterers to handle them. Little attention has been paid thus far on inducing high-quality word clusters at a large scale. The predictive exchange algorithm is quite scalable, but sometimes does not provide as good perplexity as other slower clustering algorithms. We introduce the bidirectional, interpolated, refining, and alternating (BIRA) predictive exchange algorithm. It improves upon the predictive exchange algorithm’s perplexity by up to 18%, giving it perplexities comparable to the slower two-sided exchange algorithm, and better perplexities than the slower Brown clustering algorithm. Our BIRA implementation is fast, clustering a 2.5 billion token English News Crawl corpus in 3 hours. It also reduces machine translation training time while preserving translation quality. Our implementation is portable and freely available.
international conference on computational linguistics | 2014
Liling Tan; Anne Schumann; José Manuel Martínez Martínez; Francis Bond
This paper describes the Post-Editor Z system submitted to the L2 writing assistant task in SemEval-2014. The aim of task is to build a translation assistance system to translate untranslated sentence fragments. This is not unlike the task of post-editing where human translators improve machine-generated translations. Post-Editor Z emulates the manual process of post-editing by (i) crawling and extracting parallel sentences that contain the untranslated fragments from a Web-based translation memory, (ii) extracting the possible translations of the fragments indexed by the translation memory and (iii) applying simple cosine-based sentence similarity to rank possible translations for the untranslated fragment.
north american chapter of the association for computational linguistics | 2016
Hannah Bechara; Rohit Gupta; Liling Tan; Constantin Orasan; Ruslan Mitkov; Josef van Genabith
This paper describes the WOLVESAAR systems that participated in the English Semantic Textual Similarity (STS) task in SemEval2016. We replicated the top systems from the last two editions of the STS task and extended the model using GloVe word embeddings and dense vector space LSTM based sentence representations. We compared the difference in performance of the replicated system and the extended variants. Our variants to the replicated system show improved correlation scores and all of our submissions outperform the median scores from all participating systems.
north american chapter of the association for computational linguistics | 2016
José Manuel Martínez Martínez; Liling Tan
This paper describes an information-theoretic approach to complex word identification using a classifier based on an entropy based measure based on word senses and sentence-level perplexity features. We describe the motivation behind these features based on information density and demonstrate that they perform modestly well in the complex word identification task in SemEval-2016. We also discuss the possible improvements that can be made to future work by exploring the subjectivity of word complexity and more robust evaluation metrics for the complex word identification task.