Mengwen Liu
Drexel University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mengwen Liu.
international acm sigir conference on research and development in information retrieval | 2015
Dae Hoon Park; Mengwen Liu; ChengXiang Zhai; Haohong Wang
Smartphones and tablets with their apps pervaded our everyday life, leading to a new demand for search tools to help users find the right apps to satisfy their immediate needs. While there are a few commercial mobile app search engines available, the new task of mobile app retrieval has not yet been rigorously studied. Indeed, there does not yet exist a test collection for quantitatively evaluating this new retrieval task. In this paper, we first study the effectiveness of the state-of-the-art retrieval models for the app retrieval task using a new app retrieval test data we created. We then propose and study a novel approach that generates a new representation for each app. Our key idea is to leverage user reviews to find out important features of apps and bridge vocabulary gap between app developers and users. Specifically, we jointly model app descriptions and user reviews using topic model in order to generate app representations while excluding noise in reviews. Experiment results indicate that the proposed approach is effective and outperforms the state-of-the-art retrieval models for app retrieval.
bioinformatics and biomedicine | 2014
Mengwen Liu; Yuan Ling; Yuan An; Xiaohua Hu
We develop a novel distant supervised model that integrates the results from open information extraction techniques to perform relation extraction task from biomedical literature. Unlike state-of-the-art models for relation extraction in biomedical domain which are mainly based on supervised methods, our approach does not require manually-labeled instances. In addition, our model incorporates a grouping strategy to take into consideration the coordinating structure among entities co-occurred in one sentence. We apply our approach to extract gene expression relationship between genes and brain regions from literature. Results show that our methods can achieve promising performance over baselines of Transductive Support Vector Machine and with non-grouping strategy.
international acm sigir conference on research and development in information retrieval | 2016
Mengwen Liu; Yi Fang; Dae Hoon Park; Xiaohua Hu; Zhengtao Yu
Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question retrieval and question diversification. We first propose probabilistic retrieval models to locate candidate questions that are relevant to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative for individual reviews.
bioinformatics and biomedicine | 2013
Mengwen Liu; Yuan An; Xiaohua Hu; Debra Langer; Craig J. Newschaffer; Lindsay Shea
The rising prevalence of Autism Spectrum Disorder (ASD) in the United States points to an increased need for services across the life span. Specialized services beginning at the earliest age possible are critical to maximizing long-term outcomes for children with ASD and their families. Many children later diagnosed with ASD will begin to receive services through the federally funded Early Intervention (EI) system that serves infants and toddlers from birth to age three. However, without formal recognition, services may not fully address the constellation of ASD symptoms. While ASD training in EI is becoming more widespread, there is still a need for better detection of ASD symptoms at younger ages. We hypothesized that initial EI assessment records which document the strengths and needs of children in EI, could be an important source for detecting ASD warning signs and aid state EI systems in earlier identification. In this research, we used EI records to evaluate classification techniques to identify suspected ASD cases. We improved the performance of machine learning techniques by developing and applying a unified ASD ontology to identify the most relevant features from EI records. The results indicate that using Support Vector Machine (SVM) with ontology-based unigrams as features yields the best performance. Our study shows that developing automatic approaches for quickly and effectively detecting suspected cases of ASD from non-standardized EI records earlier than most ASD cases are typically detected is promising.
bioinformatics and biomedicine | 2013
Yuan Ling; Yuan An; Mengwen Liu; Xiaohua Hu
We develop an error detecting and tagging framework for reducing data entry errors in Electronic Medical Records (EMR) systems. We propose a taxonomy of data errors with three levels: Incorrect Format and Missing error, Out of Range error, and Inconsistent error. We aim to address the challenging problem of detecting erroneous input values that look statistically normal but are abnormal in medical sense. Detecting such an error needs to take patient medical history and population data into consideration. In particular, we propose a probabilistic method based on the assumption that the input value for a field depends on the historical records of this field, and is affected by other fields through dependency relationships. We evaluate our methods using the data collected from an EMR System. The results show that the method is promising for automatic data entry error detection.
international symposium on neural networks | 2017
Yuan Ling; Yuan An; Mengwen Liu; Sadid A. Hasan; Yetian Fan; Xiaohua Hu
Word embedding in the NLP area has attracted increasing attention in recent years. The continuous bag-of-words model (CBOW) and the continuous Skip-gram model (Skip-gram) have been developed to learn distributed representations of words from a large amount of unlabeled text data. In this paper, we explore the idea of integrating extra knowledge to the CBOW and Skip-gram models and applying the new models to biomedical NLP tasks. The main idea is to construct a weighted graph from knowledge bases (KBs) to represent structured relationships among words/concepts. In particular, we propose a GCBOW model and a GSkip-gram model respectively by integrating such a graph into the original CBOW model and Skip-gram model via graph regularization. Our experiments on four general domain standard datasets show encouraging improvements with the new models. Further evaluations on two biomedical NLP tasks (biomedical similarity/relatedness task and biomedical Information Retrieval (IR) task) show that our methods have better performance than baselines.
Information Retrieval Journal | 2017
Mengwen Liu; Yi Fang; Alexander G. Choulos; Dae Hoon Park; Xiaohua Hu
Abstract Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question selection and question diversification. For question selection phase, we first employ probabilistic retrieval models to locate candidate questions that are relevant to a given review. A Recurrent Neural Network Encoder–Decoder is utilized to measure the “answerability” of questions to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative of individual reviews.
Information Retrieval | 2017
Mengwen Liu; Wanying Ding; Dae Hoon Park; Yi Fang; Rui Yan; Xiaohua Hu
A number of online marketplaces enable customers to buy or sell used products, which raises the need for ranking tools to help them find desirable items among a huge pool of choices. To the best of our knowledge, no prior work in the literature has investigated the task of used product ranking which has its unique characteristics compared with regular product ranking. While there exist a few ranking metrics (e.g., price, conversion probability) that measure the “goodness” of a product, they do not consider the time factor, which is crucial in used product trading due to the fact that each used product is often unique while new products are usually abundant in supply or quantity. In this paper, we introduce a novel time-aware metric—“sellability”, which is defined as the time duration for a used item to be traded, to quantify the value of it. In order to estimate the “sellability” values for newly generated used products and to present users with a ranked list of the most relevant results, we propose a combined Poisson regression and listwise ranking model. The model has a good property in fitting the distribution of “sellability”. In addition, the model is designed to optimize loss functions for regression and ranking simultaneously, which is different from previous approaches that are conventionally learned with a single cost function, i.e., regression or ranking. We evaluate our approach in the domain of used vehicles. Experimental results show that the proposed model can improve both regression and ranking performance compared with non-machine learning and machine learning baselines.
international conference on the theory of information retrieval | 2016
Yi Fang; Mengwen Liu
Learning to Rank (L2R) has emerged as one of the core machine learning techniques for IR. On the other hand, Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. They have produced impressive results in many computer vision and speech recognition tasks. In this paper, we introduce a unified view of Learning to Rank that integrates various L2R approaches in an energy-based ranking framework. In this framework, an energy function associates low energies to desired documents and high energies to undesired results. Learning is essentially the process of shaping the energy surface so that desired documents have lower energies. The proposed framework yields new insights into learning to rank. First, we show how various existing L2R models (pointwise, pairwise, and listwise) can be cast in the energy-based framework. Second, new L2R models can be constructed based on existing EBMs. Furthermore, inspired by the intuitive learning process of EBMs, we can devise novel energy-based models for ranking tasks. We introduce several new energy-based ranking models based on the proposed framework. The experiments are conducted on the public LETOR 4.0 benchmarks and demonstrate the effectiveness of the proposed models.
international joint conference on natural language processing | 2015
Rui Yan; Xiang Li; Mengwen Liu; Xiaohua Hu
Online social networks nowadays have the worldwide prosperity, as they have revolutionized the way for people to discover, to share, and to diffuse information. Social networks are powerful, yet they still have Achilles Heel: extreme data sparsity. Individual posting documents, (e.g., a microblog less than 140 characters), seem to be too sparse to make a difference under various scenarios, while in fact they are quite different. We propose to tackle this specific weakness of social networks by smoothing the posting document language model based on social regularization. We formulate an optimization framework with a social regularizer. Experimental results on the Twitter dataset validate the effectiveness and efficiency of our proposed model.