Ling Chen
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ling Chen.
web search and data mining | 2009
Ling Chen; Phillip Wright; Wolfgang Nejdl
As a fundamental and critical component of music information retrieval (MIR) systems, music genre classification has attracted considerable research attention. Automatically classifying music by genre is, however, a challenging problem due to the fact that music is an evolving art. While most of the existing work categorizes music using features extracted from music audio signals, in this paper, we propose to exploit the semantic information embedded in tags supplied by users of social networking websites. Particularly, we consider the tag information by creating a graph of tracks so that tracks are neighbors if they are similar in terms of their associated tags. Two classification methods based on the track graph are developed. The first one employs a classification scheme which simultaneously considers the audio content and neighborhood of tracks. In contrast, the second one is a two-level classifier which initializes genre label for unknown tracks using their audio content, and then iteratively updates the genres considering the influence from their neighbors. A set of optimizing strategies are designed for the purpose of further enhancing the quality of the two-level classifier. Extensive experiments are conducted on real-world data collected from Last.fm. Promising experimental results demonstrate the benefit of using tags for accurate music genre classification.
international workshop on data mining and audience intelligence for advertising | 2007
Avaré Stewart; Ling Chen; Raluca Paiu; Wolfgang Nejdl
Allowing global distribution of information to large audiences at very low cost, the Internet has emerged as a vital medium for marketing and advertising. Weblogs, a new form of self publication on the Internet, have attracted online advertisers because of their incredible growth-rate in recent years. In this paper, we propose to discover information diffusion paths from the blogosphere to track how information frequently flows from blog to blog. This knowledge can be used in various applications of online campaign. Our approach is based on analyzing the content of blogs. After detecting trackable topics of blogs, we model a blog community as a blog sequence database. Then, the discovery of information diffusion paths is formalized as a problem of frequent pattern mining. We develop a new data mining algorithm to discover information diffusion paths. Experiments conducted on real life dataset show that our algorithm discovers information diffusion paths efficiently. The discovered information diffusion paths are accurate in predicting the future information flow in the blog community.
data and knowledge engineering | 2006
Ling Chen; Sourav S. Bhowmick; Liang-Tien Chia
In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account of the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pattern, namely, FRequently And Concurrently muTating substructUREs (FRACTURE), that is mined from the evolution of an XML document. A discovered FRACTURE is a set of substructures of an XML document that frequently change together. Knowledge obtained from FRACTURE is useful in applications such as XML indexing, XML clustering etc. In order to keep the result patterns concise and explicit, we further formulate the problem of maximal FRACTURE mining. Two algorithms, which employ the level-wise and divide-and-conquer strategies respectively, are designed to mine the set of FRACTUREs. The second algorithm, which is more efficient, is also optimized to discover the set of maximal FRACTUREs. Experiments involving a wide range of synthetic and real-life datasets verify the efficiency and scalability of the developed algorithms.
pacific-asia conference on knowledge discovery and data mining | 2004
Ling Chen; Sourav S. Bhowmick; Liang-Tien Chia
Previous work on XML association rule mining focuses on mining from the data existing in XML documents at a certain time point. However, due to the dynamic nature of online information, an XML document typically evolves over time. Knowledge obtained from mining the evolvement of an XML document would be useful in a wide range of applications, such as XML indexing, XML clustering. In this paper, we propose to mine a novel type of association rules from a sequence of changes to XML structure, which we call XML Structural Delta Association Rule (XSD-AR). We formulate the problem of XSD-AR mining by considering both the frequency and the degree of changes to XML structure. An algorithm, which is derived from the FP-growth, and its optimizing strategy are developed for the problem. Preliminary experiment results show that our algorithm is efficient and scalable at discovering a complete set of XSD-ARs.
data and knowledge engineering | 2009
Ling Chen; Sourav S. Bhowmick; Wolfgang Nejdl
As one of the most important tasks of Web Usage Mining (WUM), web user clustering, which establishes groups of users exhibiting similar browsing patterns, provides useful knowledge to personalized web services and motivates long term research interests in the web community. Most of the existing approaches cluster web users based on the snapshots of web usage data, although web usage data are evolutionary in the nature. Consequently, the usefulness of the knowledge discovered by existing web user clustering approaches might be limited. In this paper, we address this problem by clustering web users based on the evolution of web usage data. Given a set of web users and their associated historical web usage data, we study how their usage data change over time and mine evolutionary patterns from each users usage history. The discovered patterns capture the characteristics of changes to a web users information needs. We can then cluster web users by analyzing common and similar evolutionary patterns shared by users. Web user clusters generated in this way provide novel and useful knowledge for various personalized web applications, including web advertisement and web caching.
data warehousing and knowledge discovery | 2004
Ling Chen; Sourav S. Bhowmick; Liang-Tien Chia
Due to the dynamic nature of online information, XML documents typically evolve over time. The change of the data values or structures of an XML document may exhibit some particular patterns. In this paper, we focus on the sequence of changes to the structures of an XML document to find out which subtrees in the XML structure frequently change together, which we call Frequently Changing Subtree Patterns (FCSP). In order to keep the discovered patterns more concise, we further define the problem of mining maximal FCSPs. An algorithm derived from the FP-growth is developed to mine the set of maximal FCSPs. Experiment results show that our algorithm is substantially faster than the naive algorithm and it scales well with respect to the size of the XML structure.
database systems for advanced applications | 2005
Ling Chen; Sourav S. Bhowmick; Liang-Tien Chia
Recently, several approaches that mine frequent XML query patterns and cache their results have been proposed to improve query response time. However, frequent XML query patterns mined by these approaches ignore the temporal sequence between user queries. In this paper, we take into account the temporal features of user queries to discover association rules, which indicate that when a user inquires some information from the XML document, she/he will probably inquire some other information subsequently. We cluster XML queries according to their semantics first and then mine association rules between the clusters. Moreover, not only positive but also negative association rules are discovered to design the appropriate cache replacement strategy. The experimental results showed that our approach considerably improved the caching performance by significantly reducing the query response time.
international conference on data mining | 2008
Ling Chen; Yiqun Hu; Wolfgang Nejdl
In the past few years, there has been increased research interest in detecting previously unidentified events from Web resources. Our focus in this paper is to detect events from the click-through data generated by Web search engines. Existing event detection algorithms, which mainly study the news archive data, cannot be employed directly because of the following two unique features of click-through data: 1) the information provided by click-through data is quite limited; 2) not every query issued to a Web search engine corresponds to an event in the real world. In this paper, we address this problem by proposing an effective algorithm which Detects Events from ClicK-through data DECK. We firstly transform click-through data to the 2D polar space by considering the semantic dimension and temporal dimension of queries. Robust subspace estimation is performed to detect subspaces such that each subspace consists of queries of similar semantics. Next, we prune uninteresting subspaces which do not contain queries corresponding to real events by simultaneously considering the respective distribution of queries along the semantic dimension and the temporal dimension in each subspace. Finally, events are detected from interesting subspaces using a nonparametric clustering technique. Compared with an existing approach, our experimental results based on real-life data have shown that the proposed approach is more accurate and effective in detecting real events from click-through data.
database systems for advanced applications | 2006
Ling Chen; Sourav S. Bhowmick; Jinyan Li
Clustering web users is one of the most important research topics in web usage mining. Existing approaches cluster web users based on the snapshots of web user sessions. They do not take into account the dynamic nature of web usage data. In this paper, we focus on discovering novel knowledge by clustering web users based on the evolutions of their historical web sessions. We present an algorithm called COWES to cluster web users in three steps. First, given a set of web users, we mine the history of their web sessions to extract interesting patterns that capture the characteristics of their usage data evolution. Then, the similarity between web users is computed based on their common interesting patterns. Then, the desired clusters are generated by a partitioning clustering technique. Web user clusters generated based on their historical web sessions are useful in intelligent web advertisement and web caching.
database systems for advanced applications | 2009
Ling Chen; Sourav S. Bhowmick
Mining trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. These efforts largely assumed that the trees are static. However, in many real applications, tree data are evolutionary in nature. In this paper, we focus on mining evolution patterns from historical tree-structured data. Specifically, we propose a novel approach to discover negatively correlated subtree patterns ( nectar s) from a sequence of historical versions of unordered trees.The objective is to extract subtrees that are negatively correlated in undergoing structural changes. We propose an algorithm called nectar -Miner based on a set of evolution metrics to extract nectar s. nectar s can be useful in several applications such as maintaining mirrors of a website and maintaining xml path selectivity estimation. Extensive experiments show that the proposed algorithm has good performance and can discover nectar s accurately.