Is this you? Create Your Porfile

Ee-Peng Lim

Singapore Management University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ee-Peng Lim is active.

Explore More

Publication

Featured researches published by Ee-Peng Lim.

european conference on information retrieval | 2011

Comparing twitter and traditional media using topic models

Wayne Xin Zhao; Jing Jiang; Jianshu Weng; Jing He; Ee-Peng Lim; Hongfei Yan; Xiaoming Li

Twitter as a new form of social media can potentially contain much useful information, but content analysis on Twitter has not been well studied. In particular, it is not clear whether as an information source Twitter can be simply regarded as a faster news feed that covers mostly the same information as traditional news media. In This paper we empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling. We use a Twitter-LDA model to discover topics from a representative sample of the entire Twitter. We then use text mining techniques to compare these Twitter topics with topics from New York Times, taking into consideration topic categories and types. We also study the relation between the proportions of opinionated tweets and retweets and topic categories and types. Our comparisons show interesting and useful findings for downstream IR or DM applications.

conference on information and knowledge management | 2010

Detecting product review spammers using rating behaviors

Ee-Peng Lim; Viet-An Nguyen; Nitin Jindal; Bing Liu; Hady Wirawan Lauw

This paper aims to detect users generating spam reviews or review spammers. We identify several characteristic behaviors of review spammers and model these behaviors so as to detect the spammers. In particular, we seek to model the following behaviors. First, spammers may target specific products or product groups in order to maximize their impact. Second, they tend to deviate from the other reviewers in their ratings of products. We propose scoring methods to measure the degree of spam for each reviewer and apply them on an Amazon review dataset. We then select a subset of highly suspicious reviewers for further scrutiny by our user evaluators with the help of a web based spammer evaluation software specially developed for user evaluation experiments. Our results show that our proposed ranking and supervised methods are effective in discovering spammers and outperform other baseline method based on helpfulness votes alone. We finally show that the detected spammers have more significant impact on ratings compared with the unhelpful reviewers.

Journal of Database Management | 2001

MOBILE COMMERCE: PROMISES, CHALLENGES, AND RESEARCH AGENDA

Keng Siau; Ee-Peng Lim; Zixing Shen

Advances in wireless technology increase the number of mobile device users and give pace to the rapid development of e-commerce using these devices. The new type of e-commerce, conducting transactions via mobile terminals, is called mobile commerce. Due to its inherent characteristics such as ubiquity, personalization, flexibility, and dissemination, mobile commerce promises businesses unprecedented market potential, great productivity, and high profitability. This paper presents an overview of mobile commerce development by examining the enabling technologies, the impact of mobile commerce on the business world, and the implications to mobile commerce providers. The paper also provides an agenda for future research in the area.

international conference on data mining | 2001

Hierarchical text classification and evaluation

Aixin Sun; Ee-Peng Lim

Hierarchical classification refers to the assignment of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose a top-down level-based classification method that can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar to or not far from correct ones in the category tree. We therefore propose category-similarity measures and distance-based measures to consider the degree of misclassification in measuring the classification performance. An experiment has been carried out to measure the performance of our proposed hierarchical classification method. The results showed that our method performs well for a Reuters text collection when enough training documents are given and the new measures have indeed considered the contributions of misclassified documents.

conference on information and knowledge management | 2007

Measuring article quality in wikipedia: models and evaluation

Meiqun Hu; Ee-Peng Lim; Aixin Sun; Hady Wirawan Lauw; Ba-Quy Vuong

Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our B<scp>asic</scp> model is designed based on the mutual dependency between article quality and their author authority. The P<scp>eer</scp>R<scp>eview</scp> model introduces the review behavior into measuring article quality. Finally, our P<scp>rob</scp>R<scp>eview</scp> models extend P<scp>eer</scp>R<scp>eview</scp> with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.

data warehousing and knowledge discovery | 1999

Research Issues in Web Data Mining

Sanjay Kumar Madria; Sourav S. Bhowmick; Wee Keong Ng; Ee-Peng Lim

In this paper, we discuss mining with respect to web data referred here as web data mining. In particular, our focus is on web data mining research in context of our web warehousing project called WHOWEDA (Warehouse of Web Data). We have categorized web data mining into threes areas; web content mining, web structure mining and web usage mining. We have highlighted and discussed various research issues involved in each of these web data mining category. We believe that web data mining will be the topic of exploratory research in near future.

conference on information and knowledge management | 2010

Finding unusual review patterns using unexpected rules

Nitin Jindal; Bing Liu; Ee-Peng Lim

In recent years, opinion mining attracted a great deal of research attention. However, limited work has been done on detecting opinion spam (or fake reviews). The problem is analogous to spam in Web search [1, 9 11]. However, review spam is harder to detect because it is very hard, if not impossible, to recognize fake reviews by manually reading them [2]. This paper deals with a restricted problem, i.e., identifying unusual review patterns which can represent suspicious behaviors of reviewers. We formulate the problem as finding unexpected rules. The technique is domain independent. Using the technique, we analyzed an Amazon.com review dataset and found many unexpected rules and rule groups which indicate spam activities.

IEEE Transactions on Services Computing | 2008

Dynamic Web Service Selection for Reliable Web Service Composition

San-Yih Hwang; Ee-Peng Lim; Chien-Hsiang Lee; Cheng-Hung Chen

This paper studies the dynamic web service selection problem in a failure-prone environment, which aims to determine a subset of Web services to be invoked at run-time so as to successfully orchestrate a composite web service. We observe that both the composite and constituent web services often constrain the sequences of invoking their operations and therefore propose to use finite state machine to model the permitted invocation sequences of Web service operations. We assign each state of execution an aggregated reliability to measure the probability that the given state will lead to successful execution in the context where each web service may fail with some probability. We show that the computation of aggregated reliabilities is equivalent to eigenvector computation and adopt the power method to efficiently derive aggregated reliabilities. In orchestrating a composite Web service, we propose two strategies to select Web services that are likely to successfully complete the execution of a given sequence of operations. A prototype that implements the proposed approach using BPEL for specifying the invocation order of a web service is developed and served as a testbed for comparing our proposed strategies and other baseline Web service selection strategies.

web information and data management | 2002

Web classification using support vector machine

Aixin Sun; Ee-Peng Lim; Wee Keong Ng

In web classification, web pages from one or more web sites are assigned to pre-defined categories according to their content. Since web pages are more than just plain text documents, web classification methods have to consider using other context features of web pages, such as hyperlinks and HTML tags. In this paper, we propose the use of Support Vector Machine (SVM) classifiers to classify web pages using both their text and context feature sets. We have experimented our web classification method on the WebKB data set. Compared with earlier Foil-Pilfs method on the same data set, our method has been shown to perform very well. We have also shown that the use of context features especially hyperlinks can improve the classification performance significantly.

international acm sigir conference on research and development in information retrieval | 2007

Analyzing feature trajectories for event detection

Qi He; Kuiyu Chang; Ee-Peng Lim

We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set of words with identical trends can be grouped together to reconstruct an event in a completely un-supervised manner. The document frequency of each word across time is treated like a time series, where each element is the document frequency - inverse document frequency (DFIDF) score at one time point. In this paper, we 1) first applied spectral analysis to categorize features for different event characteristics: important and less-reported, periodic and aperiodic; 2) modeled aperiodic features with Gaussian density and periodic features with Gaussian mixture densities, and subsequently detected each features burst by the truncated Gaussian approach; 3) proposed an unsupervised greedy event detection algorithm to detect both aperiodic and periodic events. All of the above methods can be applied to time series data in general. We extensively evaluated our methods on the 1-year Reuters News Corpus [3] and showed that they were able to uncover meaningful aperiodic and periodic events.

Explore More